-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Basic support for converting Expr
to SQL string
#9495
Comments
@edmondop any interest in this? I think it should be straightforward and a nice contribution. I also think it could be done in a few PRs relatively quickly |
https://github.com/datafusion-contrib/datafusion-federation/blob/main/sources/sql/src/producer.rs takes the Plan -> SQLParser -> Display route. Happy to upstream this if it's relevant. |
I'll take it :) just to confirm @alamb you suggest pattern matching all the Expr variants of DataFusion and convert them into @backkem suggests instead a different strategy, which is taking a plan in input. Any thoughts about taking an Expr or taking a Plan? |
@edmondop The code @backkem mentions also includes an expr to sql function! There may be some gaps to fill in, but the core functionality is there. It is just a private function within the datafusion-federation subproject now. I think the main work in this and related tickets for DataFusion will be deciding what the public interface should look like and filling in any gaps in the datafusion-federation code. I think it would be great to get more eyes on this code and make it easier for other projects to use. |
Understood, makes lot of sense not to have that code in the federation contrib since it doesn't have particularly to do with federation. For the API, I think it might require a little work, since I see there is a |
Since We have so many Expr pub enum Expr {
/// An expression with a specific name.
Alias(Alias),
/// A named reference to a qualified filed in a schema.
Column(Column),
/// A named reference to a variable in a registry.
ScalarVariable(DataType, Vec<String>),
/// A constant value.
Literal(ScalarValue),
/// A binary expression such as "age > 21"
BinaryExpr(BinaryExpr),
/// LIKE expression
Like(Like),
/// LIKE expression that uses regular expressions
SimilarTo(Like),
/// Negation of an expression. The expression's type must be a boolean to make sense.
Not(Box<Expr>),
/// True if argument is not NULL, false otherwise. This expression itself is never NULL.
IsNotNull(Box<Expr>),
/// True if argument is NULL, false otherwise. This expression itself is never NULL.
IsNull(Box<Expr>),
/// True if argument is true, false otherwise. This expression itself is never NULL.
IsTrue(Box<Expr>),
/// True if argument is false, false otherwise. This expression itself is never NULL.
IsFalse(Box<Expr>),
/// True if argument is NULL, false otherwise. This expression itself is never NULL.
IsUnknown(Box<Expr>),
/// True if argument is FALSE or NULL, false otherwise. This expression itself is never NULL.
IsNotTrue(Box<Expr>),
/// True if argument is TRUE OR NULL, false otherwise. This expression itself is never NULL.
IsNotFalse(Box<Expr>),
/// True if argument is TRUE or FALSE, false otherwise. This expression itself is never NULL.
IsNotUnknown(Box<Expr>),
/// arithmetic negation of an expression, the operand must be of a signed numeric data type
Negative(Box<Expr>),
/// Returns the field of a [`arrow::array::ListArray`] or
/// [`arrow::array::StructArray`] by index or range
GetIndexedField(GetIndexedField),
/// Whether an expression is between a given range.
Between(Between),
/// The CASE expression is similar to a series of nested if/else and there are two forms that
/// can be used. The first form consists of a series of boolean "when" expressions with
/// corresponding "then" expressions, and an optional "else" expression.
///
/// CASE WHEN condition THEN result
/// [WHEN ...]
/// [ELSE result]
/// END
///
/// The second form uses a base expression and then a series of "when" clauses that match on a
/// literal value.
///
/// CASE expression
/// WHEN value THEN result
/// [WHEN ...]
/// [ELSE result]
/// END
Case(Case),
/// Casts the expression to a given type and will return a runtime error if the expression cannot be cast.
/// This expression is guaranteed to have a fixed type.
Cast(Cast),
/// Casts the expression to a given type and will return a null value if the expression cannot be cast.
/// This expression is guaranteed to have a fixed type.
TryCast(TryCast),
/// A sort expression, that can be used to sort values.
Sort(Sort),
/// Represents the call of a scalar function with a set of arguments.
ScalarFunction(ScalarFunction),
/// Represents the call of an aggregate built-in function with arguments.
AggregateFunction(AggregateFunction),
/// Represents the call of a window function with arguments.
WindowFunction(WindowFunction),
/// Returns whether the list contains the expr value.
InList(InList),
/// EXISTS subquery
Exists(Exists),
/// IN subquery
InSubquery(InSubquery),
/// Scalar subquery
ScalarSubquery(Subquery),
/// Represents a reference to all available fields in a specific schema,
/// with an optional (schema) qualifier.
///
/// This expr has to be resolved to a list of columns before translating logical
/// plan into physical plan.
Wildcard { qualifier: Option<String> },
/// List of grouping set expressions. Only valid in the context of an aggregate
/// GROUP BY expression list
GroupingSet(GroupingSet),
/// A place holder for parameters in a prepared statement
/// (e.g. `$foo` or `$1`)
Placeholder(Placeholder),
/// A place holder which hold a reference to a qualified field
/// in the outer query, used for correlated sub queries.
OuterReferenceColumn(DataType, Column),
/// Unnest expression
Unnest(Unnest),
} Guess we can use a Trait to split the workload. I'd like to pick up some of them |
One of the first decisions we need to make is: where do we want this code to live? I feel the SQLParser crate (or a new SQLWriter variant) would make sense. What do others think? API wise: The builder pattern worked reasonably for me here. |
On 2nd thought: the SQL AST builder code could live in the SQLParser/SQLWriter crate. Code translating the Datafusion AST into SQL AST should probably live in a Datafusion repo. However, to get started it may be better to build things out in one place to reduce overhead. |
I split out datafusion-federation's SQL Writer code into its own package and added an example for expressions and plans. I intend to mature it over time. It's open to contributions and/or moving to a more canonical place. |
Thank you very much @backkem -- this is really quite cool. Here is my suggestion: We port I suggest this API:
One function /// Convert a DataFusion [`Expr`] to `sqlparser::ast::Expr`
///
/// This function is the opposite of `SqlToRel::sql_to_expr`
///
/// Example
/// ```
/// let expr = col("a").gt(lit(4));
/// let sql = expr_to_sql(&expr)?;
///
/// assert_eq(sql.to_string(), "a > 4")
/// ```
fn expr_to_sql(expr: Expr) -> Result<sqlparser::ast::Expr> {
...
} So it seems like the needed steps are:
|
@JanKaul noted in discord that there is another source of inspiration: https://github.com/JanKaul/datafusion-sqlgen discord link: https://discord.com/channels/885562378132000778/1166447479609376850/1215529309435985941 |
I haven't started yet because I was waiting the group to reach an agreement. @alamb proposal makes a lot of sense, @backkem @devinjdangelo wdyt? |
#9517 is open to merge in to DataFusion the in progress implementation from datafusion-federation with an api like what @alamb suggested. Once that is merged in we can work to finish the implementation and improve test coverage. |
I have opened a draft PR #9578 to capture some of the additional impls I have worked on. It is a draft because I have not yet written tests. |
we are tracking the remaining work in #9726 |
Expr
to SQL stringExpr
to SQL string
Part of #9494
Is your feature request related to a problem or challenge?
Sometimes people want to export DataFusion expressions as SQL strings, I think to use in other systems (e.g. to push predicates down to postgres)
Most recently this came up in discord: https://discord.com/channels/885562378132000778/1166447479609376850/1215387800715919390
But I am pretty sure I remember it coming up elsewhere
Describe the solution you'd like
I would like a way to convert from
Expr
--> SQL string, something like:Describe alternatives you've considered
I think the easiest way to do this might be to use the
Display
impl ofsqlparser::AST
So that would look something like:
Expr
--> `sqlparser::AAdditional context
#8661 covers the feature for converting entire LogicalPlans back to
Expr
#8736 covers the converse converting
SQL strings
toExpr
Remaining Tasks
feat: Introduce convert Expr to SQL string API and basic feature #9517
Upstream changes in
datafusion/sql/src/unparser/dialect.rs
to sqlparser-rs: Add identifier quote style toDialect
trait datafusion-sqlparser-rs#1170Support remaining Expr variants
Sorting out binary expr nesting
Add Round trip tests
The text was updated successfully, but these errors were encountered: