Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-6962] [table] Add create(drop) table SQL DDL #8548

Merged
merged 1 commit into from
May 31, 2019

Conversation

danny0405
Copy link
Contributor

@danny0405 danny0405 commented May 27, 2019

What is the purpose of the change

This pull request create a new module flink-sql-parser as a sub-module of flink-table. As a startup, only createTable/dropTable ddl grammar are supported for this patch. There also introduces a bridge class named SqlExecutableStatement which handle all the interaction between SqlNode to TableEnvironment.

Brief change log

  • Create a new module named flink-sql-parser
  • Support create/drop table DDL grammar

Verifying this change

This change is already covered by existing tests, such as FlinkSqlParserImplTest and FlinkSqlUnParserTest.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): yes The flink-table-planner and flink-table-planner-blink will depends on this new module flink-sql-parser
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? yes
  • How is the feature documented? not documented

@flinkbot
Copy link
Collaborator

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.


The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

* method #execute(type) method, the 'type' argument should be the subclass
* type for the supported {@link SqlNode}.
*/
public class SqlExecutableStatement implements ReflectiveVisitor {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have another choice here? I would suggest to not rely on or using refections if we have other choices.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to separate the SqlNode and TableEnvironment interaction logic, the implementation must relay on the specific implementation abstractions(say TableEnvironment, CatalogTable ..), personally i don't want to mix-in these abstractions to the parser module (make it clean and with better reusability).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So a tool class like this may be a better choice.

Copy link
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @danny0405

Copy link
Member

@wuchong wuchong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution. I went through the code quickly and left some comments.

@danny0405 danny0405 force-pushed the FLINK-6962 branch 2 times, most recently from 705a86d to 11b0e4a Compare May 29, 2019 09:44
@danny0405 danny0405 force-pushed the FLINK-6962 branch 5 times, most recently from cbbe622 to 9a56d11 Compare May 30, 2019 07:35
@danny0405 danny0405 changed the title [FLINK-6962] [table] Add a create table SQL DDL [FLINK-6962] [table] Add create(drop) table SQL DDL May 30, 2019
@KurtYoung
Copy link
Contributor

Please delete watermark related changes from this pr

@danny0405 danny0405 force-pushed the FLINK-6962 branch 3 times, most recently from 83f0861 to 4631975 Compare May 30, 2019 11:53
Copy link
Contributor

@KurtYoung KurtYoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice job! I will leave SqlExecutableStatement discuss to the following PRs when you start to connect parser with TableEnv

@KurtYoung KurtYoung merged commit 3710bcb into apache:master May 31, 2019
@danny0405 danny0405 deleted the FLINK-6962 branch May 31, 2019 10:34
@twalthr
Copy link
Contributor

twalthr commented Jun 3, 2019

@danny0405 Maybe the merging happened too quickly for this PR. It is still not 100% aligned with FLIP-37. We should make sure to do that before the release. In particular:

  • no colon between ROW field name and type
  • alternative syntax ROW(fieldName fieldType) to be SQL compliant
  • support for comments in rows like ROW<fieldName fieldType 'comment'>
  • no tests for parameterized VARCHARs like VARCHAR(100)
  • alternative syntax for VARCHAR(INT_MAX) as STRING
  • alternative syntax for VARBINARY(INT_MAX) as BYTES
  • REAL is not supported

Furthermore it would be great if the parser would already support all data types defined in FLIP-37 otherwise we need to check for compatibility in every corner of the stack again.

Missing are (among others): CHAR, BINARY, user-defined type identifiers, MULTISET
Please ensure consistency here.

@KurtYoung
Copy link
Contributor

@danny0405 Maybe the merging happened too quickly for this PR. It is still not 100% aligned with FLIP-37. We should make sure to do that before the release. In particular:

  • no colon between ROW field name and type
  • alternative syntax ROW(fieldName fieldType) to be SQL compliant
  • support for comments in rows like ROW<fieldName fieldType 'comment'>
  • no tests for parameterized VARCHARs like VARCHAR(100)
  • alternative syntax for VARCHAR(INT_MAX) as STRING
  • alternative syntax for VARBINARY(INT_MAX) as BYTES
  • REAL is not supported

Furthermore it would be great if the parser would already support all data types defined in FLIP-37 otherwise we need to check for compatibility in every corner of the stack again.

Missing are (among others): CHAR, BINARY, user-defined type identifiers, MULTISET
Please ensure consistency here.

@twalthr Thanks for pointing this out. I think some of the items you listed are in our following plans. We planned to first commit the parser module and set up the structures to unblock other efforts like hive extension, and leave other issues as follow up improvements. Your comprehensive list would be very helpful.

@twalthr
Copy link
Contributor

twalthr commented Jun 3, 2019

@KurtYoung is great to unblock other efforts but we should not merge half-baked contributions. Other contributions could also just rebase on this PR for being unblock until it is in a good final shape. The list that I mentioned is not a list of improvements but critical bugs/release blockers. If the goal was to simply add a first parser implementation, then the title of this PR and also the JIRA should have been renamed. The JIRA issue had 14 watchers but the feature is not complete even though the issue is closed now.

@danny0405
Copy link
Contributor Author

danny0405 commented Jun 4, 2019

@twalthr

  1. I agree that remove the colon between ROW type field name and type is more in line with the sql standard
  2. I agree that use left/right paren is more SQL compliant
  3. I agree to add comments for row type item
  4. The parameterized VARCHARs is supported as a builtin type in Calcite, and our test
    already extends it.
  5. I don't think we should support type keyword like STRING or BYTES, same reason for SQL compliant
  6. REAL is supported by CALCTE, it's hard to say we do not support it, exclude the REAL type will bring a lot of extra work

For other data types that are listed in FLIP-37 but not in this PR, i would rather fire a new JIRA issue for tracing them, after all,
support the types Flink implement in query execution has higher priority than those that are not totally supported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants