You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge?
The way that DataFusion names the output columns can be quite messy. For example, the query below produces a column named "FIRST_VALUE(foo.free) PARTITION BY [foo.host] ORDER BY [foo.time DESC NULLS FIRST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW + Int64(5) / Int64(12)" (yes that whole thing)
Arrow requires column names to be distinct strings and DataFusion creates the name based on the content of the expression.
Here is the entire query
❯ create table foo(host varchar, free int, timeint) asvalues ('a', 1, 2), ('a', 3, 4);
0 rows inset. Query took 0.003 seconds.
❯ select first_value(free) over (partition by host order bytimedesc) +5/12from foo limit10;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
| FIRST_VALUE(foo.free) PARTITION BY [foo.host] ORDER BY [foo.timeDESC NULLS FIRST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW + Int64(5) / Int64(12) |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 3 |
| 3 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------+2 rows inset. Query took 0.003 seconds.
It also means you can't select the same expression without an alias, which while annoying is likely not a critical usecase
❯ select free /1, free /1from foo;
Error during planning: Projections require unique expression names but the expression "foo.free / Int64(1)" at position 0and"foo.free / Int64(1)" at position 1 have the same name. Consider aliasing ("AS") one of them.
❯ select free /1, free /1as col2 from foo;
+---------------------+------+
| foo.free/ Int64(1) | col2 |
+---------------------+------+
| 1 | 1 |
| 3 | 3 |
+---------------------+------+
Describe the solution you'd like
While the output column names can be controlled via adding an explicit alias, I think it would be good to have a more concise way of naming the default outputs, especially for window functions
Perhaps something like not adding the OVER clause's contents so FIRST_VALUE(foo.free) in the above example.
If we do this we need to be careful to allow multiple calls to FIRST_VALUE with different OVER clauses in the same query
Describe alternatives you've considered
No response
Additional context
If we make the names more concise, we will probably need a solution like #6543 to ensure they remain unique
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem or challenge?
The way that DataFusion names the output columns can be quite messy. For example, the query below produces a column named
"FIRST_VALUE(foo.free) PARTITION BY [foo.host] ORDER BY [foo.time DESC NULLS FIRST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW + Int64(5) / Int64(12)"
(yes that whole thing)Arrow requires column names to be distinct strings and DataFusion creates the name based on the content of the expression.
Here is the entire query
It also means you can't select the same expression without an alias, which while annoying is likely not a critical usecase
Describe the solution you'd like
While the output column names can be controlled via adding an explicit alias, I think it would be good to have a more concise way of naming the default outputs, especially for window functions
Perhaps something like not adding the
OVER
clause's contents soFIRST_VALUE(foo.free)
in the above example.If we do this we need to be careful to allow multiple calls to
FIRST_VALUE
with differentOVER
clauses in the same queryDescribe alternatives you've considered
No response
Additional context
If we make the names more concise, we will probably need a solution like #6543 to ensure they remain unique
The text was updated successfully, but these errors were encountered: