-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split built in functions into "packages" #7110
Comments
Not quite familiar with the binary size issue, but maybe we could let the default config ship with most functions pre-registered: let ctx = SessionContext::new(); // Internally have most functions registered It might be more convenient for general users and also make the API backward compatible. If binary size becomes an issue, now there is another nice option available: let ctx = SessionContext::new_with_minimal_functions(); |
what about namespaced enums? it would still use feature flags, but it'd make it much more compartmentalized & able to easily split into sub crates. so something like // this could easily live in it's own crate.
pub enum StringFunctions {
// string related functions
}
pub enum BuiltinScalarFunction {
#[cfg(feature = "string_expressions")]
StringNamespace(StringFunctions),
#[cfg(feature = "list_expressions")]
ListNamespace(ListFunctions),
#[cfg(feature = "crypto_expressions")]
CryptoNamespace(CryptoFunctions),
Abs,
// ...other non namespaced functions
} |
That is a neat idea @universalmind303 -- I was even thinking there is no reason to have This would also make things like serialization simpler as well (as there is already a name --> function mapping) |
I think it would be really nice to implement these packages as crates (possibly with feature flags in datafusion core) rather than just feature flags in datafusion core. cc @tustvold |
Is your feature request related to a problem or challenge?
Right now, DataFusionhas 104 built in functions: https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.BuiltinScalarFunction.html
As we add new features and functions (most recently,
date_diff
#7097 (comment) or all the arary functions) this number will keep growingThis growth means that the size of the DataFusion library will keep growing even if users do not use those features
Describe the solution you'd like
Given that DataFusion has all the machinery to register user defined functions, and they are mostly handled the same way, I propose we split up the datafusion built into scalar function packages
Perhaps like
string_functions
(upper
,hex
, etc)crypto_functions
(hash
, etc)array_functions
(make_array
,array_contains
, etc)date_trunc
, etc)Not only would this give users better control over their binary size it would also ensure that the extensibility APIs of DataFusion were sufficient for all functions (and we could enhance the extension points if this was not the case)
This would replace the existing somewhat haphazard feature flags like
crypto_functions
:https://github.com/apache/arrow-datafusion/blob/11b7b5c215012231e5768fc5be3445c0254d0169/datafusion/physical-expr/src/lib.rs#L21-L22
I would imagine these functions would be in their own crates like
datafusion_functions_crypto
with an entry point likeDescribe alternatives you've considered
We could continue to use feature flags
If this works out, we could do the same thing for aggregate and window functions
Additional context
No response
The text was updated successfully, but these errors were encountered: