Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add recursion limit configuration to DFParser #14095

Closed
wants to merge 3 commits into from

Conversation

tlm365
Copy link
Contributor

@tlm365 tlm365 commented Jan 12, 2025

Which issue does this PR close?

Closes #14091.

Rationale for this change

What changes are included in this PR?

Set parser config, now support up to 100 expressions.

Are these changes tested?

Yes, by datafusion-testing module.

Are there any user-facing changes?

No.

@github-actions github-actions bot added the sql SQL Planner label Jan 12, 2025
@tlm365 tlm365 marked this pull request as ready for review January 12, 2025 11:47
@2010YOUY01
Copy link
Contributor

Thank you, this fix makes sense to me.

I think we can add the bug reproducer to sqllogictest for the regression test. Although this will be covered in the extended tests, it's relatively fast to run, so maybe we should also include it in the regular tests.

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Jan 13, 2025
@tlm365 tlm365 marked this pull request as draft January 13, 2025 12:50
@tlm365 tlm365 marked this pull request as ready for review January 16, 2025 08:03
@@ -52,6 +52,7 @@ const SQLITE_PREFIX: &str = "sqlite";
pub fn main() -> Result<()> {
tokio::runtime::Builder::new_multi_thread()
.enable_all()
.thread_stack_size(4 * 1024 * 1024)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to adjust this config to avoid error thread 'tokio-runtime-worker' has overflowed its stack of sqllogictest process since we change the with_recursion_limit config.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am worried about this. We have recently added the the recursive protection feature precisely to avoid stack overflows (ever) but now after this change the stacks can overflow.

Here is the stack where it overflows (precisely what the Recursion limit is designed to prevent:

<sqlparser::tokenizer::Token as core::clone::Clone>::clone tokenizer.rs:52
<sqlparser::tokenizer::TokenWithSpan as core::clone::Clone>::clone tokenizer.rs:633
core::option::Option<&T>::cloned option.rs:1917
sqlparser::parser::Parser::next_token mod.rs:3443
sqlparser::parser::Parser::parse_data_type_helper mod.rs:8097
sqlparser::parser::Parser::parse_data_type mod.rs:8083
sqlparser::parser::Parser::parse_prefix::{{closure}} mod.rs:1243
sqlparser::parser::Parser::try_parse mod.rs:3808
sqlparser::parser::Parser::maybe_parse mod.rs:3795
sqlparser::parser::Parser::parse_prefix mod.rs:1242
sqlparser::parser::Parser::parse_subexpr mod.rs:964
sqlparser::parser::Parser::parse_expr mod.rs:957
core::ops::function::FnMut::call_mut function.rs:166
sqlparser::parser::Parser::parse_comma_separated_with_trailing_commas mod.rs:3730
sqlparser::parser::Parser::parse_comma_separated mod.rs:3715
sqlparser::parser::Parser::parse_prefix mod.rs:1381
sqlparser::parser::Parser::parse_subexpr mod.rs:964
sqlparser::parser::Parser::parse_infix mod.rs:2928
sqlparser::parser::Parser::parse_subexpr mod.rs:974
sqlparser::parser::Parser::parse_expr mod.rs:957
core::ops::function::FnMut::call_mut function.rs:166
sqlparser::parser::Parser::parse_comma_separated_with_trailing_commas mod.rs:3730
sqlparser::parser::Parser::parse_comma_separated mod.rs:3715
sqlparser::parser::Parser::parse_prefix mod.rs:1381
sqlparser::parser::Parser::parse_subexpr mod.rs:964
sqlparser::parser::Parser::parse_infix mod.rs:2928
sqlparser::parser::Parser::parse_subexpr mod.rs:974
sqlparser::parser::Parser::parse_expr mod.rs:957
core::ops::function::FnMut::call_mut function.rs:166
sqlparser::parser::Parser::parse_comma_separated_with_trailing_commas mod.rs:3730
sqlparser::parser::Parser::parse_comma_separated mod.rs:3715
sqlparser::parser::Parser::parse_prefix mod.rs:1381
sqlparser::parser::Parser::parse_subexpr mod.rs:964
sqlparser::parser::Parser::parse_infix mod.rs:2928
sqlparser::parser::Parser::parse_subexpr mod.rs:974
sqlparser::parser::Parser::parse_expr mod.rs:957
core::ops::function::FnMut::call_mut function.rs:166
sqlparser::parser::Parser::parse_comma_separated_with_trailing_commas mod.rs:3730
sqlparser::parser::Parser::parse_comma_separated mod.rs:3715
sqlparser::parser::Parser::parse_prefix mod.rs:1381
sqlparser::parser::Parser::parse_subexpr mod.rs:964
sqlparser::parser::Parser::parse_infix mod.rs:2928
sqlparser::parser::Parser::parse_subexpr mod.rs:974
sqlparser::parser::Parser::parse_expr mod.rs:957
core::ops::function::FnMut::call_mut function.rs:166
sqlparser::parser::Parser::parse_comma_separated_with_trailing_commas mod.rs:3730
sqlparser::parser::Parser::parse_comma_separated mod.rs:3715
...
std::panicking::try::do_call panicking.rs:557
__rust_try 0x000000010761f11c
[Inlined] std::panicking::try panicking.rs:520
[Inlined] std::panic::catch_unwind panic.rs:358
std::thread::Builder::spawn_unchecked_::{{closure}} mod.rs:559
core::ops::function::FnOnce::call_once{{vtable.shim}} function.rs:250
[Inlined] <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once boxed.rs:1972
[Inlined] <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once boxed.rs:1972
std::sys::pal::unix::thread::Thread::new::thread_start thread.rs:105
_pthread_start 0x00000001940832e4

Note @blaginin recently added the recursive feature to sql parser (but it isn't yet released) which I think will allow this query to run without having to update the stack size

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, very clear. So, I will close this PR and wait for the new release of sqlparser-rs. Tthanks for the review @alamb @2010YOUY01

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for working on this @tlm365 -- however I worry that this change exposes DataFusion users to stack overflows 🤔

It seems like a new version of sqlparser should be able to fix that. Maybe once that fix is released we can make this change without causing stack overflows

Speaking if sqlparser-rs releases, it seems it is (past) time to make a new one:

@@ -257,6 +257,9 @@ fn ensure_not_set<T>(field: &Option<T>, name: &str) -> Result<(), ParserError> {
Ok(())
}

// By default, allow expressions up to this deep before error
const DEFAULT_REMAINING_DEPTH: usize = 100;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I double checked that the default in sqlparser is 50 so this is 2x the size https://docs.rs/sqlparser/latest/src/sqlparser/parser/mod.rs.html#187

I am a little worried about this as it is a hard coded limit but I think we can always make it a config flag later

@@ -257,6 +257,9 @@ fn ensure_not_set<T>(field: &Option<T>, name: &str) -> Result<(), ParserError> {
Ok(())
}

// By default, allow expressions up to this deep before error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can say this is 2x the default limit of sqlparser so it is clear this is meant to increase the limit

Suggested change
// By default, allow expressions up to this deep before error
// By default, allow expressions up to this deep before error
// (2x the default in sqlparser)

@@ -52,6 +52,7 @@ const SQLITE_PREFIX: &str = "sqlite";
pub fn main() -> Result<()> {
tokio::runtime::Builder::new_multi_thread()
.enable_all()
.thread_stack_size(4 * 1024 * 1024)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am worried about this. We have recently added the the recursive protection feature precisely to avoid stack overflows (ever) but now after this change the stacks can overflow.

Here is the stack where it overflows (precisely what the Recursion limit is designed to prevent:

<sqlparser::tokenizer::Token as core::clone::Clone>::clone tokenizer.rs:52
<sqlparser::tokenizer::TokenWithSpan as core::clone::Clone>::clone tokenizer.rs:633
core::option::Option<&T>::cloned option.rs:1917
sqlparser::parser::Parser::next_token mod.rs:3443
sqlparser::parser::Parser::parse_data_type_helper mod.rs:8097
sqlparser::parser::Parser::parse_data_type mod.rs:8083
sqlparser::parser::Parser::parse_prefix::{{closure}} mod.rs:1243
sqlparser::parser::Parser::try_parse mod.rs:3808
sqlparser::parser::Parser::maybe_parse mod.rs:3795
sqlparser::parser::Parser::parse_prefix mod.rs:1242
sqlparser::parser::Parser::parse_subexpr mod.rs:964
sqlparser::parser::Parser::parse_expr mod.rs:957
core::ops::function::FnMut::call_mut function.rs:166
sqlparser::parser::Parser::parse_comma_separated_with_trailing_commas mod.rs:3730
sqlparser::parser::Parser::parse_comma_separated mod.rs:3715
sqlparser::parser::Parser::parse_prefix mod.rs:1381
sqlparser::parser::Parser::parse_subexpr mod.rs:964
sqlparser::parser::Parser::parse_infix mod.rs:2928
sqlparser::parser::Parser::parse_subexpr mod.rs:974
sqlparser::parser::Parser::parse_expr mod.rs:957
core::ops::function::FnMut::call_mut function.rs:166
sqlparser::parser::Parser::parse_comma_separated_with_trailing_commas mod.rs:3730
sqlparser::parser::Parser::parse_comma_separated mod.rs:3715
sqlparser::parser::Parser::parse_prefix mod.rs:1381
sqlparser::parser::Parser::parse_subexpr mod.rs:964
sqlparser::parser::Parser::parse_infix mod.rs:2928
sqlparser::parser::Parser::parse_subexpr mod.rs:974
sqlparser::parser::Parser::parse_expr mod.rs:957
core::ops::function::FnMut::call_mut function.rs:166
sqlparser::parser::Parser::parse_comma_separated_with_trailing_commas mod.rs:3730
sqlparser::parser::Parser::parse_comma_separated mod.rs:3715
sqlparser::parser::Parser::parse_prefix mod.rs:1381
sqlparser::parser::Parser::parse_subexpr mod.rs:964
sqlparser::parser::Parser::parse_infix mod.rs:2928
sqlparser::parser::Parser::parse_subexpr mod.rs:974
sqlparser::parser::Parser::parse_expr mod.rs:957
core::ops::function::FnMut::call_mut function.rs:166
sqlparser::parser::Parser::parse_comma_separated_with_trailing_commas mod.rs:3730
sqlparser::parser::Parser::parse_comma_separated mod.rs:3715
sqlparser::parser::Parser::parse_prefix mod.rs:1381
sqlparser::parser::Parser::parse_subexpr mod.rs:964
sqlparser::parser::Parser::parse_infix mod.rs:2928
sqlparser::parser::Parser::parse_subexpr mod.rs:974
sqlparser::parser::Parser::parse_expr mod.rs:957
core::ops::function::FnMut::call_mut function.rs:166
sqlparser::parser::Parser::parse_comma_separated_with_trailing_commas mod.rs:3730
sqlparser::parser::Parser::parse_comma_separated mod.rs:3715
...
std::panicking::try::do_call panicking.rs:557
__rust_try 0x000000010761f11c
[Inlined] std::panicking::try panicking.rs:520
[Inlined] std::panic::catch_unwind panic.rs:358
std::thread::Builder::spawn_unchecked_::{{closure}} mod.rs:559
core::ops::function::FnOnce::call_once{{vtable.shim}} function.rs:250
[Inlined] <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once boxed.rs:1972
[Inlined] <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once boxed.rs:1972
std::sys::pal::unix::thread::Thread::new::thread_start thread.rs:105
_pthread_start 0x00000001940832e4

Note @blaginin recently added the recursive feature to sql parser (but it isn't yet released) which I think will allow this query to run without having to update the stack size

@tlm365 tlm365 closed this Jan 17, 2025
@alamb alamb reopened this Jan 17, 2025
@alamb alamb marked this pull request as draft January 17, 2025 18:13
@alamb alamb closed this Jan 17, 2025
@alamb
Copy link
Contributor

alamb commented Jan 17, 2025

To be clear I think we will still need to increase the recursion limit in DataFusion to allow the sqllogictest to pass. But until the next sqlparser upgrade I don't we can do so safely

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sql SQL Planner sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RecursionLimitedExceeded received for some sqlite test queries
3 participants