You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
For context, we have an optimization whereby we pre-aggregate one table onto another in array form. However, once we have the array, we need to be able to perform various aggregations as array functions. It would be nice if all aggregate functions had an array function with equivalent semantics.
Describe the solution you'd like
Here is a list of cases we've encountered where we are missing the needed functionality:
multi-column array_sortby. This is needed to emulate the functionality of array_agg where we need to specify multiple columns with ascending and descending. Need something that takes a list of columns to sort by and ascending/descending specifiers.
array_percentile_cont and array_percentile_disc: same semantics as percentile_cont and percentild_disc
array_std_samp: should have same semantics as the STDDEV_SAMP aggregation
array_var_samp: should have same semantics as the VAR_SAMP aggregation
We currently have workaround for the following, but it would be nice to have these for efficiency sake:
array_count_distinct that ignores nulls. We implement this as array_length(array_distinct(array_filter(x)->x is not null))
array_count that ignores nulls. We implement this as array_length(array_filter(x)->x is not null).
array_avg that ignores nulls. We implement this as array_avg(array_filter(x)->x is not null).
array_first that returns the first non-null element of the array. We implement this as (array_filter(x)->x is not null))[1]
array_last that returns the first non-null element of the array. We implement this as (array_filter(x)->x is not null))[array_length(array_filter(x)->x is not null)]
Describe alternatives you've considered
See the second set of functions listed above. We have workarounds for those, but they are complex and sub-optimal since they require materializing an array when not otherwise needed if we had direct support.
The first set of functions doesn't have a reasonable workaround. We would need to unnest, aggregate, and re-nest which is extremely costly.
Additional context
The text was updated successfully, but these errors were encountered:
We have marked this issue as stale because it has been inactive for 6 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to StarRocks!
Feature request
Is your feature request related to a problem? Please describe.
For context, we have an optimization whereby we pre-aggregate one table onto another in array form. However, once we have the array, we need to be able to perform various aggregations as array functions. It would be nice if all aggregate functions had an array function with equivalent semantics.
Describe the solution you'd like
Here is a list of cases we've encountered where we are missing the needed functionality:
We currently have workaround for the following, but it would be nice to have these for efficiency sake:
Describe alternatives you've considered
See the second set of functions listed above. We have workarounds for those, but they are complex and sub-optimal since they require materializing an array when not otherwise needed if we had direct support.
The first set of functions doesn't have a reasonable workaround. We would need to unnest, aggregate, and re-nest which is extremely costly.
Additional context
The text was updated successfully, but these errors were encountered: