-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark-compatible CAST operation #11201
Comments
cc @viirya and @andygrove and @Omega359 You could always create a user defined function that does the cast 🤔 bit I am not sure about how much in DataFusion special cases the CAST |
I'm surprised cast is the thing that is catching you .. for me it was the differences in functions and their behaviour between the two systems. I wouldn't have a clue atm how to make cast in datafusion replaceable - I suspect it wouldn't be an easy task. |
There's definitely difference in some functions as well, but those have been mostly minor so far (I haven't gotten to regexps or date/time stuff yet), like returning null vs throwing on overflow. I did confirm today I can use Comet's scalar functions by overriding the existing ScalarUDFs in DF, that's quite nice! For Expr::Cast that doesn't work as it's part of the Expr enum rather than being a ScalarUDF - is that by choice, or just pending migration? But I plan to try if I can use FunctionRewrite to change the Expr::Cast into a ScalarUDF that uses the Comet cast. |
I think it is largely historical, but realistically there is enough code that handles Cast specially in the core that switching it to use a UDF is likely a susbtantial project (we could certainly do it, but I wanted to offer my opinion on the scope) |
Is your feature request related to a problem or challenge?
We're looking to use DataFusion as a replacement for Spark for some workflows, through Spark -> Substrait -> DataFusion conversions. Lot of the functionality already works, and it's been nice to see DataFusion and Spark agree on most behavior we've tested so far. However, one place where we're seeing more differences is the CAST expression, for example on casting complex types into strings (, or casting strings into numbers (where Spark is more lenient).
One option I've considered is to use Comet's cast, which I'd expect to be closer aligned with Spark (or at least on the way there). However, is there a way for me to replace/redirect the inbuilt Cast expression into using the Comet implementation?
Or would there be any other alternatives I could try?
Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: