-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support "standard" / alternate format arguments for to_timestamp
#8915
Comments
to_timestamp
to_timestamp
Also, perhaps the https://github.com/apache/arrow-datafusion-comet project will be able to contribute a spark compatible implementation of timestamp parsing (that is probably being overly optimistic, though we'll know more shortly) |
@alamb I am interested in this problem. You can assigned it to me |
Hi @Tangruilin -- I think this one is a pretty significant amount of work -- it might be worth sketching out how you would proceed. Perhaps you could make a new project in https://github.com/datafusion-contrib (I can create the repo for you if you want) that contains some UDFs that implement the different semantics? |
@alamb |
indeed it is meaningful work, I just want to make sure that it is clear that the project is potentially a very large amount of work in my opinion |
This ticket was the followup to #8886 to allow for alternate formats instead of chrono. It's really not feasible imho until DataFusion has the ability to select or alter function behaviour based on a Dialect such as Postgresql, Spark, etc. |
🤦 -- sorry |
Is your feature request related to a problem or challenge?
After #8886 (thanks to @Omega359) DataFusion supports converting strings to timestamps using a string format:
Which will parse
'2020-09-08T12:00:00+00:00'
with several possible formats%c', '%+', '%Y-%m-%d
However, as @comphead points out, the format used is specific to
chrono
, the underlying Rust library used. These are slightly different semantics than any existingto_timestamp
(it isn't postgres format strings, nor is it spark format strings, it is something datafusion specific based on the rust chrono format strings)Describe the solution you'd like
Ideally users could decide what "dialect" of string format specifiers they wanted to support based on configuration option. For example, either postgres or spark,
However, this is non trivial given the scope of those two implementations
Describe alternatives you've considered
Users can always use DataFusion's user defined functions to define the semantics they want, for example with a ScalarUDF that rewrites the specified time string from a postgres format into the chrono format
(though there are likely all sorts of corner cases -- see #8886 (comment))
Additional context
@jhorstmann has notes about Postgres: #5398 (comment)
@Omega359 notes that the spark format library is entirely different still: #5398 (comment)
The text was updated successfully, but these errors were encountered: