[SPARK-50309][SQL] Add documentation for SQL pipe syntax #48852

dtenedor · 2024-11-14T21:30:36Z

What changes were proposed in this pull request?

This PR adds documentation for SQL pipe syntax.

Why are the changes needed?

It provides a reference table of available operators and describes how the syntax works in each of the supported circumstances.

Does this PR introduce any user-facing change?

No, this is a documentation-only change.

How was this patch tested?

N/A

Was this patch authored or co-authored using generative AI tooling?

No

dtenedor · 2024-11-14T21:32:04Z

cc @cloud-fan @gengliangwang here is documentation support for the new SQL pipe syntax, which is nearly completed.

gengliangwang · 2024-11-14T21:50:22Z

@dtenedor The doc looks good to me overall.
Do we consider showing more examples like https://github.com/google/zetasql/blob/master/docs/pipe-syntax.md?

gengliangwang · 2024-11-14T21:53:26Z

docs/sql-pipe-syntax.md

+* To perform full-table aggregation, use the `AGGREGATE` operator with a list of aggregate
+expressions to evaluate.<br>
+  This returns one single row in the output table.
+* To perform aggregation with grouping, use the `AGGREGATE` oeprator with a `GROUP BY` clause.<br>


Suggested change

* To perform aggregation with grouping, use the `AGGREGATE` oeprator with a `GROUP BY` clause.<br>

* To perform aggregation with grouping, use the `AGGREGATE` operator with a `GROUP BY` clause.<br>

gengliangwang · 2024-11-14T21:57:57Z

docs/sql-pipe-syntax.md

+| `LIMIT <n> [OFFSET <m>]`                                                                  | Returns the specified number of input rows, preserving ordering<br/>(if any).                                                                                                                                                         |
+| `AGGREGATE <agg_expr> [[AS] alias], ...`                                                  | Performs full-table aggregation, returning one result row with<br/>a column for each aggregate expression.                                                                                                                            |
+| `AGGREGATE [<agg_expr> [[AS] alias], ...]`<br/>`GROUP BY <grouping_expr> [AS alias], ...` | Performs aggregation with grouping, returning one row per group.<br/>The column list includes the grouping columns first and then the<br/>aggregate columns afterwards. Aliases can be assigned directly<br/>on grouping expressions. |
+| `[LEFT \| ...] JOIN <relation>`<br/>` [ON <condition> \| USING(col, ...)]`                | Joins rows from both inputs, returning a filtered cross-product of<br/>the pipe input table and the table expression following the<br/>JOIN keyword.                                                                                  |


Let's put down all the supported join types?

Sounds good, updated.

gengliangwang · 2024-11-14T22:00:52Z

docs/sql-pipe-syntax.md

+| `AGGREGATE <agg_expr> [[AS] alias], ...`                                                  | Performs full-table aggregation, returning one result row with<br/>a column for each aggregate expression.                                                                                                                            |
+| `AGGREGATE [<agg_expr> [[AS] alias], ...]`<br/>`GROUP BY <grouping_expr> [AS alias], ...` | Performs aggregation with grouping, returning one row per group.<br/>The column list includes the grouping columns first and then the<br/>aggregate columns afterwards. Aliases can be assigned directly<br/>on grouping expressions. |
+| `[LEFT \| ...] JOIN <relation>`<br/>` [ON <condition> \| USING(col, ...)]`                | Joins rows from both inputs, returning a filtered cross-product of<br/>the pipe input table and the table expression following the<br/>JOIN keyword.                                                                                  |
+| `ORDER BY <expr> [ASC \| DESC], ...`                                                      | Returns the input rows after sorting as indicated.                                                                                                                                                                                    |


Here is the doc for ORDER BY: https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-orderby.html
Do we support nulls_sort_order?

Yes, we do support it. I added a mention for this.

dtenedor · 2024-11-15T02:12:04Z

@gengliangwang thanks for your review! I updated the docs with more examples and information per recommendation, please take another look.

commit

0c5dabd

github-actions bot added the DOCS label Nov 14, 2024

dtenedor changed the title ~~commit~~ [SPARK-50309][SQL] Add documentation for SQL pipe syntax Nov 14, 2024

dtenedor marked this pull request as ready for review November 14, 2024 21:30

gengliangwang reviewed Nov 14, 2024

View reviewed changes

dtenedor added 2 commits November 14, 2024 18:09

respond to code review comments

f59f36d

respond to code review comments

a32adcf

dtenedor requested a review from gengliangwang November 15, 2024 02:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-50309][SQL] Add documentation for SQL pipe syntax #48852

[SPARK-50309][SQL] Add documentation for SQL pipe syntax #48852

dtenedor commented Nov 14, 2024

dtenedor commented Nov 14, 2024

gengliangwang commented Nov 14, 2024

gengliangwang Nov 14, 2024

dtenedor Nov 15, 2024

gengliangwang Nov 14, 2024

dtenedor Nov 15, 2024

gengliangwang Nov 14, 2024

dtenedor Nov 15, 2024

dtenedor commented Nov 15, 2024 •

edited

Loading

	* To perform aggregation with grouping, use the `AGGREGATE` oeprator with a `GROUP BY` clause.<br>
	* To perform aggregation with grouping, use the `AGGREGATE` operator with a `GROUP BY` clause.<br>

[SPARK-50309][SQL] Add documentation for SQL pipe syntax #48852

Are you sure you want to change the base?

[SPARK-50309][SQL] Add documentation for SQL pipe syntax #48852

Conversation

dtenedor commented Nov 14, 2024

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

dtenedor commented Nov 14, 2024

gengliangwang commented Nov 14, 2024

gengliangwang Nov 14, 2024

Choose a reason for hiding this comment

dtenedor Nov 15, 2024

Choose a reason for hiding this comment

gengliangwang Nov 14, 2024

Choose a reason for hiding this comment

dtenedor Nov 15, 2024

Choose a reason for hiding this comment

gengliangwang Nov 14, 2024

Choose a reason for hiding this comment

dtenedor Nov 15, 2024

Choose a reason for hiding this comment

dtenedor commented Nov 15, 2024 • edited Loading

dtenedor commented Nov 15, 2024 •

edited

Loading