Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic support for converting Expr to SQL string #9495

Closed
5 tasks done
alamb opened this issue Mar 7, 2024 · 16 comments
Closed
5 tasks done

Basic support for converting Expr to SQL string #9495

alamb opened this issue Mar 7, 2024 · 16 comments
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Mar 7, 2024

Part of #9494

Is your feature request related to a problem or challenge?

Sometimes people want to export DataFusion expressions as SQL strings, I think to use in other systems (e.g. to push predicates down to postgres)

Most recently this came up in discord: https://discord.com/channels/885562378132000778/1166447479609376850/1215387800715919390

But I am pretty sure I remember it coming up elsewhere

Describe the solution you'd like

I would like a way to convert from Expr --> SQL string, something like:

// make a > 1 expr
let expr: Expr = col("a").gt(lit(1));
// convert to a string
let sql_string = to_sql(&expr, &schema)?;
assert_eq!(sql_string, "a > 1")

Describe alternatives you've considered

I think the easiest way to do this might be to use the Display impl of sqlparser::AST

So that would look something like:

  1. write a transform of Expr --> `sqlparser::A

Additional context

#8661 covers the feature for converting entire LogicalPlans back to Expr

#8736 covers the converse converting SQL strings to Expr

Remaining Tasks

@alamb
Copy link
Contributor Author

alamb commented Mar 7, 2024

@edmondop any interest in this? I think it should be straightforward and a nice contribution. I also think it could be done in a few PRs relatively quickly

@backkem
Copy link
Contributor

backkem commented Mar 7, 2024

https://github.com/datafusion-contrib/datafusion-federation/blob/main/sources/sql/src/producer.rs takes the Plan -> SQLParser -> Display route. Happy to upstream this if it's relevant.

@edmondop
Copy link
Contributor

edmondop commented Mar 8, 2024

I'll take it :) just to confirm @alamb you suggest pattern matching all the Expr variants of DataFusion and convert them into sqlparser::ast::Expr? I briefly looked at the docs and there seemed to be some non obvious differences.

@backkem suggests instead a different strategy, which is taking a plan in input. Any thoughts about taking an Expr or taking a Plan?

@devinjdangelo
Copy link
Contributor

@edmondop The code @backkem mentions also includes an expr to sql function! There may be some gaps to fill in, but the core functionality is there. It is just a private function within the datafusion-federation subproject now.

I think the main work in this and related tickets for DataFusion will be deciding what the public interface should look like and filling in any gaps in the datafusion-federation code.

I think it would be great to get more eyes on this code and make it easier for other projects to use.

@edmondop
Copy link
Contributor

edmondop commented Mar 8, 2024

@edmondop The code @backkem mentions also includes an expr to sql function! There may be some gaps to fill in, but the core functionality is there. It is just a private function within the datafusion-federation subproject now.

I think the main work in this and related tickets for DataFusion will be deciding what the public interface should look like and filling in any gaps in the datafusion-federation code.

I think it would be great to get more eyes on this code and make it easier for other projects to use.

Understood, makes lot of sense not to have that code in the federation contrib since it doesn't have particularly to do with federation. For the API, I think it might require a little work, since I see there is a _col_ref_offset: usize that is passed, which I am not sure we want to expose through the API

@Lordworms
Copy link
Contributor

Since We have so many Expr

pub enum Expr {
    /// An expression with a specific name.
    Alias(Alias),
    /// A named reference to a qualified filed in a schema.
    Column(Column),
    /// A named reference to a variable in a registry.
    ScalarVariable(DataType, Vec<String>),
    /// A constant value.
    Literal(ScalarValue),
    /// A binary expression such as "age > 21"
    BinaryExpr(BinaryExpr),
    /// LIKE expression
    Like(Like),
    /// LIKE expression that uses regular expressions
    SimilarTo(Like),
    /// Negation of an expression. The expression's type must be a boolean to make sense.
    Not(Box<Expr>),
    /// True if argument is not NULL, false otherwise. This expression itself is never NULL.
    IsNotNull(Box<Expr>),
    /// True if argument is NULL, false otherwise. This expression itself is never NULL.
    IsNull(Box<Expr>),
    /// True if argument is true, false otherwise. This expression itself is never NULL.
    IsTrue(Box<Expr>),
    /// True if argument is  false, false otherwise. This expression itself is never NULL.
    IsFalse(Box<Expr>),
    /// True if argument is NULL, false otherwise. This expression itself is never NULL.
    IsUnknown(Box<Expr>),
    /// True if argument is FALSE or NULL, false otherwise. This expression itself is never NULL.
    IsNotTrue(Box<Expr>),
    /// True if argument is TRUE OR NULL, false otherwise. This expression itself is never NULL.
    IsNotFalse(Box<Expr>),
    /// True if argument is TRUE or FALSE, false otherwise. This expression itself is never NULL.
    IsNotUnknown(Box<Expr>),
    /// arithmetic negation of an expression, the operand must be of a signed numeric data type
    Negative(Box<Expr>),
    /// Returns the field of a [`arrow::array::ListArray`] or
    /// [`arrow::array::StructArray`] by index or range
    GetIndexedField(GetIndexedField),
    /// Whether an expression is between a given range.
    Between(Between),
    /// The CASE expression is similar to a series of nested if/else and there are two forms that
    /// can be used. The first form consists of a series of boolean "when" expressions with
    /// corresponding "then" expressions, and an optional "else" expression.
    ///
    /// CASE WHEN condition THEN result
    ///      [WHEN ...]
    ///      [ELSE result]
    /// END
    ///
    /// The second form uses a base expression and then a series of "when" clauses that match on a
    /// literal value.
    ///
    /// CASE expression
    ///     WHEN value THEN result
    ///     [WHEN ...]
    ///     [ELSE result]
    /// END
    Case(Case),
    /// Casts the expression to a given type and will return a runtime error if the expression cannot be cast.
    /// This expression is guaranteed to have a fixed type.
    Cast(Cast),
    /// Casts the expression to a given type and will return a null value if the expression cannot be cast.
    /// This expression is guaranteed to have a fixed type.
    TryCast(TryCast),
    /// A sort expression, that can be used to sort values.
    Sort(Sort),
    /// Represents the call of a scalar function with a set of arguments.
    ScalarFunction(ScalarFunction),
    /// Represents the call of an aggregate built-in function with arguments.
    AggregateFunction(AggregateFunction),
    /// Represents the call of a window function with arguments.
    WindowFunction(WindowFunction),
    /// Returns whether the list contains the expr value.
    InList(InList),
    /// EXISTS subquery
    Exists(Exists),
    /// IN subquery
    InSubquery(InSubquery),
    /// Scalar subquery
    ScalarSubquery(Subquery),
    /// Represents a reference to all available fields in a specific schema,
    /// with an optional (schema) qualifier.
    ///
    /// This expr has to be resolved to a list of columns before translating logical
    /// plan into physical plan.
    Wildcard { qualifier: Option<String> },
    /// List of grouping set expressions. Only valid in the context of an aggregate
    /// GROUP BY expression list
    GroupingSet(GroupingSet),
    /// A place holder for parameters in a prepared statement
    /// (e.g. `$foo` or `$1`)
    Placeholder(Placeholder),
    /// A place holder which hold a reference to a qualified field
    /// in the outer query, used for correlated sub queries.
    OuterReferenceColumn(DataType, Column),
    /// Unnest expression
    Unnest(Unnest),
}

Guess we can use a Trait to split the workload. I'd like to pick up some of them

@backkem
Copy link
Contributor

backkem commented Mar 8, 2024

One of the first decisions we need to make is: where do we want this code to live? I feel the SQLParser crate (or a new SQLWriter variant) would make sense. What do others think?

API wise: The builder pattern worked reasonably for me here.

@backkem
Copy link
Contributor

backkem commented Mar 8, 2024

On 2nd thought: the SQL AST builder code could live in the SQLParser/SQLWriter crate. Code translating the Datafusion AST into SQL AST should probably live in a Datafusion repo. However, to get started it may be better to build things out in one place to reduce overhead.

@backkem
Copy link
Contributor

backkem commented Mar 8, 2024

I split out datafusion-federation's SQL Writer code into its own package and added an example for expressions and plans. I intend to mature it over time. It's open to contributions and/or moving to a more canonical place.

@alamb
Copy link
Contributor Author

alamb commented Mar 8, 2024

I split out datafusion-federation's SQL Writer code into its own package and added an example for expressions and plans. I intend to mature it over time. It's open to contributions and/or moving to a more canonical place.

Thank you very much @backkem -- this is really quite cool.

Here is my suggestion:

We port from_df_expr upstream into DataFusion (as this I think is easier and has been asked for several times) and we can obsess about making it feature complete / tested in this repo. Once that project is complete, we move on to the LogicalPlans (aka #8661)

I suggest this API:

  • Crate: datafusion-sql crate
  • Module: datafusion_sql::unparser. In the file datafusion/sql/src/unparser/expr.rs
  • API

One function

/// Convert a DataFusion [`Expr`] to `sqlparser::ast::Expr`
///
/// This function is the opposite of `SqlToRel::sql_to_expr`
///
/// Example
/// ```
/// let expr = col("a").gt(lit(4));
/// let sql = expr_to_sql(&expr)?;
///
/// assert_eq(sql.to_string(), "a > 4")
/// ```
fn expr_to_sql(expr: Expr) -> Result<sqlparser::ast::Expr> {
...
}

So it seems like the needed steps are:

  1. Agree on new API (see above proposal)
  2. port expr_to_sql to the new location
  3. Write tests that cover the new code (maybe round trip Expr -> SQL -> Expr?)

@alamb
Copy link
Contributor Author

alamb commented Mar 9, 2024

@edmondop
Copy link
Contributor

edmondop commented Mar 9, 2024

I haven't started yet because I was waiting the group to reach an agreement. @alamb proposal makes a lot of sense, @backkem @devinjdangelo wdyt?

@devinjdangelo
Copy link
Contributor

I haven't started yet because I was waiting the group to reach an agreement. @alamb proposal makes a lot of sense, @backkem @devinjdangelo wdyt?

#9517 is open to merge in to DataFusion the in progress implementation from datafusion-federation with an api like what @alamb suggested.

Once that is merged in we can work to finish the implementation and improve test coverage.

@backkem
Copy link
Contributor

backkem commented Mar 12, 2024

The initial PR (#9517) has landed. For anyone interested in helping to flesh out the Expr serialization to SQL, a good place to start are the remaining not_impl_err! statements in expr.rs. Add a test case in roundtrip_expr in sql::tests and try to make it pass!

@devinjdangelo
Copy link
Contributor

I have opened a draft PR #9578 to capture some of the additional impls I have worked on. It is a draft because I have not yet written tests.

@alamb
Copy link
Contributor Author

alamb commented May 14, 2024

we are tracking the remaining work in #9726

@alamb alamb closed this as completed May 14, 2024
@alamb alamb changed the title Support converting Expr to SQL string Basic support for converting Expr to SQL string May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants