-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved table functions #13627
Improved table functions #13627
Conversation
* Revises properties, definitions in the catalog * Adds a "table function" abstraction to model such functions * Specific functions for HTTP, inline, local and S3. * Extended SQL types in the catalog * Restructure extenal table definitions to use table functions * EXTEND syntax for Druid's extern table function * Support for array-valued table function parameters * Support for array-valued SQL query parameters
server/src/test/java/org/apache/druid/catalog/model/table/InlineInputSourceDefnTest.java
Fixed
Show fixed
Hide fixed
server/src/main/java/org/apache/druid/catalog/model/table/ResolvedExternalTable.java
Fixed
Show fixed
Hide fixed
sql/src/main/java/org/apache/druid/sql/calcite/external/ExternalOperatorConversion.java
Fixed
Show fixed
Hide fixed
server/src/main/java/org/apache/druid/catalog/model/table/InputSourceDefn.java
Fixed
Show fixed
Hide fixed
server/src/main/java/org/apache/druid/catalog/model/table/ResolvedExternalTable.java
Fixed
Show fixed
Hide fixed
server/src/main/java/org/apache/druid/catalog/model/facade/DatasourceFacade.java
Fixed
Show fixed
Hide fixed
Removed some files to shrink PR
server/src/test/java/org/apache/druid/catalog/model/table/DatasourceTableTest.java
Fixed
Show fixed
Hide fixed
To anyone who wants to review, here is a summary of an overview of this PR provided via a private channel. Let's assume you are already familiar with the model layer from last time. The key changes here are to make table functions more generic. First, there is a table function abstraction: something that has a set of parameter definitions, accepts a set of arguments, and returns an external table definition. The table function replaces the property annotations from the prior PR, so the annotation stuff was removed. Before, there was a definition for each kind of external table. This was too restrictive. So, now there is a generic external table definition that has two "parts". One is a definition of the input source, the other is a list of possible input formats. Each input source has its own metadata class which says which properties are available (as before), but also says what table functions are available. There three kind of functions. 1) An "ad-hoc" (from scratch) function that is a fancy form of the existing extern function. 2) a "partial" function where some of the data resides in the catalog, the rest is given by the query using the table name as a function name. 3) a "complete" function where the catalog table name can be used either as a table or as a zero-argument table function. The input source metadata specifies the various properties, how function arguments are mapped to properties, and how to merge catalog and argument values. Formats are similar, but are shared by all input sources. Given this, the second part is the SQL layer table functions are redone to use the table function abstraction and the input source definition. Basically, the HTTP input source function holds onto the HTTP input source definition, and asks that definition for the table function definition, which then provide the list of parameters and process the actual call. Given that, the related code is kind of one big thing: it won't work to use, say, the old Calcite code with the new model code or visa-versa. You can start here: Oh, and if you want to know what all this code actually does, look at the A follow-on PR will add catalog integration to the Calcite planner so that we incorporate catalog information when planning an MSQ query. There are some bits toward that goal in this PR (where they overlap with external tables). However, to keep this PR from growing bigger; the planner work, which uses table specs from the catalog, comes later. |
* `bucket` (`VARCHAR`) - Corresponds to the `bucket` field of the `objects` JSON field. SQL | ||
does not have syntax for an array of objects. Instead, this function taks a single bucket, | ||
and one or more objects within that bucket. | ||
* `paths` (`ARRAY` of `VARCHAR`) - Corresponds to the `path` fields of the `object` JSON field. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add the accessKeyId
, secretAccessKey
, and assumeRoleArn
here as some of the examples below use them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They already appear? Check down two paragraphs. The intro to this paragraph says, "The function accepts the following parameters to specify the objects to read" while the later one says "also accepts the following security properties".
docs/multi-stage-query/reference.md
Outdated
The string constant can also include any of the keywords mentioned above: | ||
|
||
- `HOUR` - Same as `'PT1H'` | ||
- `DAY` - Same as `'P1D'` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is WEEK
an option here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this change because it will come in a later PR. Any of Druid's valid strings will work. However, WEEK isn't a valid SQL keyword, so it can't be used in the syntax shown earlier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't make WEEK an option. Kill WEEK, it's not commutable with any other granularity larger than it (MONTH, YEAR, etc.) and generates oddities when used.
...core/s3-extensions/src/main/java/org/apache/druid/catalog/model/table/S3InputSourceDefn.java
Show resolved
Hide resolved
...core/s3-extensions/src/main/java/org/apache/druid/catalog/model/table/S3InputSourceDefn.java
Show resolved
Hide resolved
...core/s3-extensions/src/main/java/org/apache/druid/catalog/model/table/S3InputSourceDefn.java
Outdated
Show resolved
Hide resolved
...core/s3-extensions/src/main/java/org/apache/druid/catalog/model/table/S3InputSourceDefn.java
Show resolved
Hide resolved
+ "Please use an equivalent of these granularities: %s.", | ||
Arrays.stream(GranularityType.values()) | ||
.filter(granularityType -> !granularityType.equals(GranularityType.NONE)) | ||
.map(Enum::name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this return the enum name as it is or the Period string (ex. this would return MINUTE
, or PT1M
)? If the user provides the granularity enum name on the next go, that seems that it will fail the validation here, as it wont be a valid Period?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an error message, copied from somewhere in MSQ. The error message shouldn't get copied into the spec (unless I've done something very wrong.)
There was a comment in an early PR that someone didn't like two ways to specify granularities, and we wanted only the ISO 8601 format. So, the user should use that. Easy enough to fix if we want to do something else.
server/src/main/java/org/apache/druid/catalog/model/TableDefnRegistry.java
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/TableDefnRegistry.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/TableDefnRegistry.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/table/BaseTableFunction.java
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/table/BaseInputSourceDefn.java
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/table/BaseInputSourceDefn.java
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/table/ExternalTableDefn.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/table/ExternalTableSpec.java
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/table/HttpInputSourceDefn.java
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/table/InputFormatDefn.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/table/InputFormats.java
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/table/LocalInputSourceDefn.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/table/LocalInputSourceDefn.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/table/ResolvedExternalTable.java
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/table/ResolvedExternalTable.java
Show resolved
Hide resolved
sql/src/main/java/org/apache/druid/sql/calcite/schema/DruidCalciteSchemaModule.java
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/table/InlineInputSourceDefn.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/table/InlineInputSourceDefn.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/table/TableFunction.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@paul-rogers, thanks for the detailed write-up and unit tests. This feature looks pretty good! Checkpointing a first pass that mostly contains nits.
server/src/main/java/org/apache/druid/catalog/model/table/LocalInputSourceDefn.java
Outdated
Show resolved
Hide resolved
...core/s3-extensions/src/main/java/org/apache/druid/catalog/model/table/S3InputSourceDefn.java
Outdated
Show resolved
Hide resolved
...core/s3-extensions/src/main/java/org/apache/druid/catalog/model/table/S3InputSourceDefn.java
Outdated
Show resolved
Hide resolved
@paul-rogers |
@paul-rogers , hmm it doesnt seem to give me the ability to approve the pr anymore 😢 . But I do approve of change. |
Description
Another step toward the Druid catalog, this one providing extended functionality to the table functions introduced in an earlier PR.
Prior to this PR, MSQ did not support query parameters. This PR starts to provide support. Some of the required functionality will follow in the planner-focused catalog PR.
Release note
See the
reference.md
documentation page in this PR for the revised functionality. Of note are the new table functions and the array table function parameters. This functionality allows the user to, say, reuse the same SQL for every ingest, only passing in a different set of input files as query parameters on each ingest.This PR has: