Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support "standard" / alternate format arguments for to_timestamp #8915

Open
alamb opened this issue Jan 19, 2024 · 8 comments
Open

Support "standard" / alternate format arguments for to_timestamp #8915

alamb opened this issue Jan 19, 2024 · 8 comments
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Jan 19, 2024

Is your feature request related to a problem or challenge?

After #8886 (thanks to @Omega359) DataFusion supports converting strings to timestamps using a string format:

SELEECT to_timestamp('2020-09-08T12:00:00+00:00', '2020-09-08 12/00/00+00:00', '%c', '%+', '%Y-%m-%d %H/%M/%s%#z'

Which will parse '2020-09-08T12:00:00+00:00' with several possible formats %c', '%+', '%Y-%m-%d

However, as @comphead points out, the format used is specific to chrono , the underlying Rust library used. These are slightly different semantics than any existing to_timestamp (it isn't postgres format strings, nor is it spark format strings, it is something datafusion specific based on the rust chrono format strings)

Describe the solution you'd like

Ideally users could decide what "dialect" of string format specifiers they wanted to support based on configuration option. For example, either postgres or spark,

However, this is non trivial given the scope of those two implementations

Describe alternatives you've considered

Users can always use DataFusion's user defined functions to define the semantics they want, for example with a ScalarUDF that rewrites the specified time string from a postgres format into the chrono format

(though there are likely all sorts of corner cases -- see #8886 (comment))

Additional context

@jhorstmann has notes about Postgres: #5398 (comment)
@Omega359 notes that the spark format library is entirely different still: #5398 (comment)

@alamb alamb added the enhancement New feature or request label Jan 19, 2024
@alamb alamb changed the title Support "standard" format arguments for to_timestamp Support "standard" / alternate format arguments for to_timestamp Jan 19, 2024
@alamb
Copy link
Contributor Author

alamb commented Jan 19, 2024

Also, perhaps the https://github.com/apache/arrow-datafusion-comet project will be able to contribute a spark compatible implementation of timestamp parsing (that is probably being overly optimistic, though we'll know more shortly)

@Tangruilin
Copy link
Contributor

@alamb I am interested in this problem.

You can assigned it to me

@alamb
Copy link
Contributor Author

alamb commented Jan 26, 2024

Hi @Tangruilin -- I think this one is a pretty significant amount of work -- it might be worth sketching out how you would proceed. Perhaps you could make a new project in https://github.com/datafusion-contrib (I can create the repo for you if you want) that contains some UDFs that implement the different semantics?

@Tangruilin
Copy link
Contributor

@alamb
I think you can do that, but maybe i can not finish all the works myself.
After all, I am currently not particularly familiar with datafusion.
But this is a meaningful work, In my opinion

@alamb
Copy link
Contributor Author

alamb commented Jan 26, 2024

But this is a meaningful work, In my opinion

indeed it is meaningful work, I just want to make sure that it is clear that the project is potentially a very large amount of work in my opinion

@alamb
Copy link
Contributor Author

alamb commented Feb 7, 2024

I think @Omega359 did this in #8886

@alamb alamb closed this as completed Feb 7, 2024
@Omega359
Copy link
Contributor

Omega359 commented Feb 7, 2024

This ticket was the followup to #8886 to allow for alternate formats instead of chrono. It's really not feasible imho until DataFusion has the ability to select or alter function behaviour based on a Dialect such as Postgresql, Spark, etc.

@alamb alamb reopened this Feb 7, 2024
@alamb
Copy link
Contributor Author

alamb commented Feb 7, 2024

🤦 -- sorry

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants