Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-50309][SQL] Add documentation for SQL pipe syntax #48852

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

dtenedor
Copy link
Contributor

What changes were proposed in this pull request?

This PR adds documentation for SQL pipe syntax.

Why are the changes needed?

It provides a reference table of available operators and describes how the syntax works in each of the supported circumstances.

Does this PR introduce any user-facing change?

No, this is a documentation-only change.

How was this patch tested?

N/A

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the DOCS label Nov 14, 2024
@dtenedor dtenedor changed the title commit [SPARK-50309][SQL] Add documentation for SQL pipe syntax Nov 14, 2024
@dtenedor dtenedor marked this pull request as ready for review November 14, 2024 21:30
@dtenedor
Copy link
Contributor Author

cc @cloud-fan @gengliangwang here is documentation support for the new SQL pipe syntax, which is nearly completed.

@gengliangwang
Copy link
Member

@dtenedor The doc looks good to me overall.
Do we consider showing more examples like https://github.com/google/zetasql/blob/master/docs/pipe-syntax.md?

* To perform full-table aggregation, use the `AGGREGATE` operator with a list of aggregate
expressions to evaluate.<br>
This returns one single row in the output table.
* To perform aggregation with grouping, use the `AGGREGATE` oeprator with a `GROUP BY` clause.<br>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* To perform aggregation with grouping, use the `AGGREGATE` oeprator with a `GROUP BY` clause.<br>
* To perform aggregation with grouping, use the `AGGREGATE` operator with a `GROUP BY` clause.<br>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

| `LIMIT <n> [OFFSET <m>]` | Returns the specified number of input rows, preserving ordering<br/>(if any). |
| `AGGREGATE <agg_expr> [[AS] alias], ...` | Performs full-table aggregation, returning one result row with<br/>a column for each aggregate expression. |
| `AGGREGATE [<agg_expr> [[AS] alias], ...]`<br/>`GROUP BY <grouping_expr> [AS alias], ...` | Performs aggregation with grouping, returning one row per group.<br/>The column list includes the grouping columns first and then the<br/>aggregate columns afterwards. Aliases can be assigned directly<br/>on grouping expressions. |
| `[LEFT \| ...] JOIN <relation>`<br/>` [ON <condition> \| USING(col, ...)]` | Joins rows from both inputs, returning a filtered cross-product of<br/>the pipe input table and the table expression following the<br/>JOIN keyword. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put down all the supported join types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, updated.

| `AGGREGATE <agg_expr> [[AS] alias], ...` | Performs full-table aggregation, returning one result row with<br/>a column for each aggregate expression. |
| `AGGREGATE [<agg_expr> [[AS] alias], ...]`<br/>`GROUP BY <grouping_expr> [AS alias], ...` | Performs aggregation with grouping, returning one row per group.<br/>The column list includes the grouping columns first and then the<br/>aggregate columns afterwards. Aliases can be assigned directly<br/>on grouping expressions. |
| `[LEFT \| ...] JOIN <relation>`<br/>` [ON <condition> \| USING(col, ...)]` | Joins rows from both inputs, returning a filtered cross-product of<br/>the pipe input table and the table expression following the<br/>JOIN keyword. |
| `ORDER BY <expr> [ASC \| DESC], ...` | Returns the input rows after sorting as indicated. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the doc for ORDER BY: https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-orderby.html
Do we support nulls_sort_order?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we do support it. I added a mention for this.

@dtenedor
Copy link
Contributor Author

dtenedor commented Nov 15, 2024

@gengliangwang thanks for your review! I updated the docs with more examples and information per recommendation, please take another look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants