Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hive connector JSON Serde support for custom timestamp formats #4538

Open
grantatspothero opened this issue Jul 22, 2020 · 3 comments
Open

Comments

@grantatspothero
Copy link
Contributor

grantatspothero commented Jul 22, 2020

When trying to read a JSON table with the hive connector, configuring the timestamp format of the JSON serde (org.apache.hive.hcatalog.data.JsonSerDe) is not supported from Presto.

Thus, if the string timestamps do not match the default timestamp formatter you get errors like this when trying to query a JSON serde table:

io.prestosql.spi.PrestoException: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff]
	at io.prestosql.plugin.hive.GenericHiveRecordCursor.advanceNextPosition(GenericHiveRecordCursor.java:226)
	at io.prestosql.plugin.hive.HiveRecordCursor.advanceNextPosition(HiveRecordCursor.java:177)
	at io.prestosql.spi.connector.RecordPageSource.getNextPage(RecordPageSource.java:90)
	at io.prestosql.operator.TableScanOperator.getOutput(TableScanOperator.java:287)
	at io.prestosql.operator.Driver.processInternal(Driver.java:379)
	at io.prestosql.operator.Driver.lambda$processFor$8(Driver.java:283)
	at io.prestosql.operator.Driver.tryWithLock(Driver.java:675)
	at io.prestosql.operator.Driver.processFor(Driver.java:276)
	at io.prestosql.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1075)
	at io.prestosql.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
	at io.prestosql.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:484)
	at io.prestosql.$gen.Presto_329____20200716_075405_2.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff]
	at java.sql.Timestamp.valueOf(Timestamp.java:204)
	at org.apache.hive.common.util.TimestampParser.parseTimestamp(TimestampParser.java:121)
	at org.apache.hive.hcatalog.data.JsonSerDe.extractCurrentField(JsonSerDe.java:327)
	at org.apache.hive.hcatalog.data.JsonSerDe.populateRecord(JsonSerDe.java:220)
	at org.apache.hive.hcatalog.data.JsonSerDe.extractCurrentField(JsonSerDe.java:384)
	at org.apache.hive.hcatalog.data.JsonSerDe.extractCurrentField(JsonSerDe.java:350)
	at org.apache.hive.hcatalog.data.JsonSerDe.populateRecord(JsonSerDe.java:220)
	at org.apache.hive.hcatalog.data.JsonSerDe.deserialize(JsonSerDe.java:178)
	at io.prestosql.plugin.hive.GenericHiveRecordCursor.advanceNextPosition(GenericHiveRecordCursor.java:217)
	... 14 more

JSON serde code showing you can configure the timestamp format with a serde property of timestamp.formats: https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/JsonSerDe.java#L132-L133

Presto code showing this is not configurable right now:
https://github.com/prestosql/presto/blob/master/presto-hive/src/main/java/io/prestosql/plugin/hive/HiveTableProperties.java#L49-L71

Opening this issue because I asked around in the slack channel:
https://prestosql.slack.com/archives/CGB0QHWSW/p1595369039293100

The workaround for now:

  • When creating a JSON table in Presto, set the timestamps fields as type varchar and then parse the timestamps using presto datetime functions.
@findepi
Copy link
Member

findepi commented Jul 22, 2020

One option is to expose timestamp.formats as a table property.
Other -- #954

@grantatspothero
Copy link
Contributor Author

grantatspothero commented Jul 23, 2020

+1 on #954
Hive has something similar with SERDEPROPERTIES/TBLPROPERTIES, nice to have a catch all for when you need configuration that hasn't had support specifically added.

@billonahill
Copy link

This issue is also biting us.

Isn't there another issue with this proposed solution though, which is that timestamp.formats is supported in o.a.h.hive.serde2.JsonSerDe, but Trino uses org.apache.hive.hcatalog.data.JsonSerDe? Seems we'd also need to provide a way to use the latter, no?

import org.apache.hive.hcatalog.data.JsonSerDe;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants