Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: expand operator #3563
feat: expand operator #3563
Changes from 7 commits
14ac2f5
99acdec
3f601de
a18ce8d
d97c9ed
9b10c7a
325b38e
7ec14ac
c4248fe
d5383e1
8e1c1f5
325ee55
ebe75f6
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about make it more general as
Exprs
? So that we can reuse it inHopWindowExecutor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean make
Keys
more general?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, IIUC it's just
InputRef
, right?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's indices of columns actually......
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are quite similar to me 😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
InputRef
is essentially the same asuint32
here. But I do not see why we need to generalize fromInputRef
to arbitraryExpr
, or how this is related toHopWindowExecutor
.As for
uint32
vsInputRef
, I prefer to avoidInputRef
: it is lacking thereturn_type
part to be treated as a genuineExpr
, and the caller (ExecutorBuilders including hop) actually just takescolumn_idx
out of it rather than building it into an expression and then evaluate.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The expand operator are following expressions:
If we change HopWindowExecutor to ExpandExecutor, they are just following expressions:
And these can be computed by frontend, which can be reused by both streaming and batch executors. As with data type, I don't see it as a blocking issue for us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve got it and +1 for this. maybe we can have a try after this PR is merged and well tested with distinct agg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting generalization. We can give it a try, but probably after we have a working e2e distinct agg as tianshuo suggested above.
Actually, this generalized node is just a multi-row expr evaluation. If we want to, it can also cover use cases of
project
(single row of exprs) andvalues
(input is fixed to 1-row-0-col). Just not sure if this generalization is too much.Also to my understanding, it would be
rather than the example above.
I also agree that data type is not a big issue. I was just listing the differences between our InputRef in rust vs InputRef in proto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok to try it after distinct agg has been resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the input has
HashShard(hash_keys)
, and allsubset
coverhash_keys
, does the output provideHashShard(hash_keys)
as well? For example:SomeShard
is also valid but may result in unnecessary exchange. We can also leave this as a later improvement.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure for the rows generated with the Null value... maybe we can give it
SomeShard
first and ensure correctness. After having test, we can give it a better(more accurate) distribution.