Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: from_plan generate incorrect schema when it is Aggregate #6862

Closed
wants to merge 1 commit into from
Closed

fix: from_plan generate incorrect schema when it is Aggregate #6862

wants to merge 1 commit into from

Conversation

YjyJeff
Copy link
Contributor

@YjyJeff YjyJeff commented Jul 6, 2023

Rationale for this change

For the from_plan function, we want to generate a new logical plan with inputs and expressions replaced. When we encounter the Aggregate variant, the old code simply clones the old schema. However, the new group by expressions may have a different schema from the old one. Therefore, we should use the try_new function to create a new Aggregate variant with the correct schema.

Are these changes tested?

Yes

Are there any user-facing changes?

No

…ression may have different schema with the old one
@github-actions github-actions bot added the logical-expr Logical plan and expressions label Jul 6, 2023
@jackwener
Copy link
Member

jackwener commented Jul 6, 2023

There is related issue or specific example?

It's legacy code, exists with some reason. If there are no specific reasons, we should not change it.

@YjyJeff
Copy link
Contributor Author

YjyJeff commented Jul 6, 2023

I have not created the issue, should we create an issue to talk about it?

I found this problem in our custom optimizer. In our optimizer, we will change the return type of the expression. For example, if we have the query

select col_0 / 3 as a, count(1) from table group by a 

and the type of col_0 is DataType::UInt32. Without our custom optimizer, the expression col_0 / 3 will return the type DataType::Int64(because the literal 3 is parsed as Int64). Our optimizer will convert the literal Int64(3) to UInt(32) such that we can avoid the cost of casting col_0 to Int64Array. In this case, the return type of the group by expression changed from DataType::Int64 to DataType::UInt32.

It's legacy code, exists with some reason. If there are no specific reasons, we should not change it.

What are the reasons? In my view, from_plan may change the schema, therefore, the legacy code is a bug here. My patch fixed the bug and have passed all of the tests

@jackwener
Copy link
Member

related PR: #6827 which revert #6595 #6820

related issue: #6819 #6790

@jackwener
Copy link
Member

I found this problem in our custom optimizer. In our optimizer, we will change the return type of the expression. For example, if we have the query

currently, we can't change schema in optimization, because it will cause scheme match failed.

@YjyJeff
Copy link
Contributor Author

YjyJeff commented Jul 6, 2023

Ok, I know the reasons now. However, the from_plan function looks pretty odd now: some logical plans support changing the schema but others do not. I think it still should be solved 0.0

@YjyJeff YjyJeff closed this Jul 6, 2023
@jackwener
Copy link
Member

jackwener commented Jul 6, 2023

Ok, I know the reasons now. However, the from_plan function looks pretty odd now: some logical plans support changing the schema but others do not. I think it still should be solved 0.0

Yes, I think we indeed need to resolve this problem, but it's troublesome.

It needs to take more time to discuss how to solve this problem.

I add a comment #6790 (comment) to explain reason for this. Hope it can help others in community.

@jackwener
Copy link
Member

cc @alamb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants