Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: unwrap casts of string and dictionary columns #10323

Merged
merged 7 commits into from
May 1, 2024

Conversation

erratic-pattern
Copy link
Contributor

@erratic-pattern erratic-pattern commented Apr 30, 2024

Which issue does this PR close?

Closes #10220

Rationale for this change

Improves performance of queries that filter string/dictionary columns against literals (WHERE dict_col = 1) by avoiding casting the column

What changes are included in this PR?

Allows the optimizer to unwrap casts of string types in comparison operations.

Are these changes tested?

Yes

Are there any user-facing changes?

No

@github-actions github-actions bot added optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) labels Apr 30, 2024
@erratic-pattern
Copy link
Contributor Author

cc @jayzhan211 this one doesn't interfere with comparison_cast unlike #10221


# Now query using an integer which must be coerced into a dictionary string
query T?
SELECT * from test where column2 = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explain query for this is helpful too.

@@ -298,19 +306,46 @@ fn is_support_data_type(data_type: &DataType) -> bool {
)
}

/// Returns true if [UnwrapCastExprRewriter] supports casting this value as a string
fn is_supported_string_type(data_type: &DataType) -> bool {
matches!(data_type, DataType::Utf8)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can also support LargeUtf8.

Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @erratic-pattern and @jayzhan211 -- this is a really nice PR. Thank you for proceeding and pushing to get the "right" solution (rather than changing coercion)

}
if lit_value.is_null() {
// null value can be cast to any type of null value
return Ok(Some(ScalarValue::try_from(target_type)?));
return ScalarValue::try_from(target_type).ok();
Copy link
Contributor

@alamb alamb May 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if ignoring errors will make tracking down unsupported types harder (I am imagining when we try and add REEArray or StringViewArray). 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if ignoring errors will make tracking down unsupported types harder (I am imagining when we try and add REEArray or StringViewArray). 🤔

yeah we may want to reintroduce a Result type, but as it stands currently the errors were already being ignored anyway so they didn't serve any purpose.

also the boolean is_supported_* means that we'll just silently skip those cases anyway.

so realistically most of those errors just don't ever occur and if they did occur we wouldn't know about them. we would need to rethink the design of this code to surface those errors.

another interesting possibility is debug assertions that panic, but only on 'debug' builds. I've always found this to be an interesting but underutilized tool for uncovering bugs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I was mistaken, these errors weren't being ignored. I could add them back in in a new PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I was mistaken, these errors weren't being ignored. I could add them back in in a new PR?

I think that would be really nice if you had the time.

@alamb alamb merged commit 1e0c760 into apache:main May 1, 2024
25 checks passed
@alamb
Copy link
Contributor

alamb commented May 1, 2024

I made a PR with a few more tests here: #10335

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Slow comparisions to dictionary columns with type coercion
3 participants