-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add extra_properties to hive table properties #9475
Conversation
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveMetadata.java
Outdated
Show resolved
Hide resolved
1e628fc
to
042aa85
Compare
@sajjoseph good catch on #1176. I had not covered the |
042aa85
to
79ba52c
Compare
This was proposed a while back, and I don’ think it can be done that way. The issue is to change a property you must do a read/update/write cycle that includes all existing properties, and that is very dangerous as you could drop (or restore) properties. I'm not sure how you could do this with the existing table property system. I would guess we need to extend this to support properties that look like
Of course this would require changing TablePropertyManager also |
thanks for having a look @dain. I mostly opened this PR to get support for the Hive I also opened another PR that just adds that individual table property - #9457 Given this approach will require a lot more changes, do you think the PR that simply adds the |
I closed #9457 and will try to apply the feedback to this PR. Thanks again for the comments. |
@posulliv this separation between Trino properties and "extra properties" may be rather misleading for the end users. How will a user now which property is a Trino property and which property is an arbitrary property? |
@findinpath the separation is actually important. Otherwise, how a user would know whether a property is "supposed to be honored by Trino", and which property is just a "dumb passthru" to the metastore? |
We're also blocked on lack of Hive table property support when trying to specify JSON timestamp formats, as described in #4538. Do we have consensus on this approach? |
FYI We added this patch on top of our Trino fork & deployed to production successfully, specifically for the purpose of specifying JSON timestamp formats. /cc @billonahill |
Are there plans to merge this branch into the main Trino release? Or are we exploring other ways to support this? As @jchoi614 mentioned, we've applied this and are running in in production, but want to be sure we're in sync with future releases. |
👋 @posulliv @findepi @findinpath and others - this PR has become inactive. Could you work together to move this forward? We're working on closing out old and inactive PRs, so if you're too busy or this has too many merge conflicts to be worth picking back up, we'll be making another pass to close it out in a few weeks. |
Closing this one out due to inactivity, but please reopen if you would like to pick this back up. |
Following up on this - should we reopen this PR @billonahill ? I do not see the use of Hive extra properties in the current Trino release & this is a patch that has been working for us in production. |
Yeah @jchoi614 we should. @colebow can we please re-open this one? We have been running this patch in our production environment for quite some time now and would like to have it part of Trino. Or if there is another way to provide a timestamp format for a given table we can use another another approach. |
I'm not an authority here, but I'd make sure our PR author (@posulliv) is actually interested in picking this back up and getting it across the finish line when it hasn't been updated in over a year. If not, someone else who's interested should open a new PR with attribution in the commit message. Opening/closing a PR is superficial and only represents the status of the work - but if the author isn't interested in picking it up, it should likely remain closed. |
@findinpath since you reopened.. does that mean @findepi and team are going to move this to merge proactively with @posulliv ? |
79ba52c
to
79db187
Compare
79db187
to
4f2a806
Compare
plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseHiveConnectorTest.java
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseHiveConnectorTest.java
Outdated
Show resolved
Hide resolved
4f2a806
to
f7bbcde
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works fine for me
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveTableProperties.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveTableProperties.java
Show resolved
Hide resolved
if (extraProperties != null) { | ||
properties.put(EXTRA_PROPERTIES, mapJsonCodec(String.class, String.class).fromJson(extraProperties)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if hive/spark or a different system add some additional properties - is it possible for us to capture all of them ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Say if we Hive/Spark adds some additional properties - it won't be visible here right ? Since this JsonString - captures only the extra-properties configured via Trino - Is this intentional or can we be a bit open about the properties added by other system ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Yes that is correct. It was intentional as we didn't have a use case for that. Do we want to add that as part of this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would nice if we could capture those properties also - SHOW CREATE TABLES
are kind of queries which allows us to migrate tables from one system to other and we should make sure we capture most of the properties in this case.
plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseHiveConnectorTest.java
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseHiveConnectorTest.java
Outdated
Show resolved
Hide resolved
bdb92f2
to
f5920d2
Compare
I was looking at https://trino.io/docs/current/sql/alter-table.html#set-properties documentation and as @findepi mentioned in #9475 (comment) Trino has its own table properties that it has to honour and that's why we can't really use this syntax for doing pass throughs for arbitrary properties. The current PR solves partly this problem by adding the "extra_properties" workaround. However it is missing the fact that the users may add through Spark arbitrary properties which are not visible then in Trino. Nevertheless, users need a way to pass through arbitrary properties so that they can land on the metastore. An alternative idea would be to provide system procedures to allow the user to perform such operations add/remove/list arbitrary table operations. This path doesn't involve any hacking on the |
I think I am fine having that but I am concerned about users being able to shoot themselves in the foot. |
But won't it affect the overall UX it would be like creating a table and then invoking the procedure to add/update custom property.
But doesn't this allow us to migrate the table schema and related properties from one HMS to a different HMS. (One of the use case of
I think we filter out the properties which are understood and honored by Trino and remaining properties could be exposed as |
plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseHiveConnectorTest.java
Outdated
Show resolved
Hide resolved
")\n" + | ||
"WITH (\n" + | ||
" extra_properties = map_from_entries(ARRAY[ROW('extra.property', 'true')]),\n" + | ||
" format = 'ORC'\n" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying out this feature locally and this is what I see:
trino:default> show create table test1;
Create Table
-------------------------------------------------------------------------------
CREATE TABLE hive.default.test1 (
c1 integer
)
WITH (
extra_properties = map_from_entries(ARRAY[ROW('extra.property', 'true')]),
format = 'ORC'
)
(1 row)
[metastore]> select * from TABLE_PARAMS where tbl_id = 56;
+--------+--------------------------------+---------------------------------------------+
| TBL_ID | PARAM_KEY | PARAM_VALUE |
+--------+--------------------------------+---------------------------------------------+
| 56 | STATS_GENERATED_VIA_STATS_TASK | workaround for potential lack of HIVE-12730 |
| 56 | auto.purge | false |
| 56 | extra.property | true |
| 56 | extra_properties | {
"extra.property" : "true"
} |
| 56 | numFiles | 0 |
| 56 | numRows | 0 |
| 56 | presto_query_id | 20230227_195136_00008_skeew |
| 56 | presto_version | 406-258-gf5920d2 |
| 56 | rawDataSize | 0 |
| 56 | totalSize | 0 |
| 56 | transient_lastDdlTime | 1677527498 |
+--------+--------------------------------+---------------------------------------------+
There are a few properties missing in the show create table
output.
Is this intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@findinpath are you asking why the additional hive table properties not supported by trino are not included in the extra properties in the show create table
output?
If so, yes that was intentional. This PR was to only support extra properties that are specified explicitly by the user when the table is created through trino.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking about the context where users do update from different engines the table properties.
Being limited to a number of "extra" properties defined up-front seems rather limiting to me.
Do note that a similar UX experience will need to be offered for Iceberg, Delta as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@findinpath got it. This PR was intentionally limited to the hive connector and wasn't intended to deal with other query engines.
Do you think that could be done as a follow up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think that could be done as a follow up?
Definitely.
Let's first settle on how to do it in Hive and implementing the functionality in other connectors will eventually follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR was to only support extra properties that are specified explicitly by the user when the table is created through trino.
Can we extend this functionality in this PR ? One more additional concern here is that we add an additional properties extra_properties
in Hive table which can be overriden by other engines. We could skip some of the properties like STATS_GENERATED_VIA_STATS_TASK
, presto_query_id
and presto_version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CREATED_BY
and TRINO_VIEW_FLAG
.isInstanceOf(QueryFailedException.class) | ||
.hasMessage("Multiple entries with same key: transactional=true and transactional=false"); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NULL
property values cause failures.
Query 20230227_200050_00010_skeew failed: null value in entry: extra.property=null
java.lang.NullPointerException: null value in entry: extra.property=null
at com.google.common.collect.CollectPreconditions.checkEntryNotNull(CollectPreconditions.java:33)
at com.google.common.collect.ImmutableMapEntry.<init>(ImmutableMapEntry.java:54)
at com.google.common.collect.ImmutableMap.entryOf(ImmutableMap.java:339)
at com.google.common.collect.ImmutableMap$Builder.put(ImmutableMap.java:449)
at io.trino.plugin.hive.HiveMetadata.getEmptyTableProperties(HiveMetadata.java:1106)
at io.trino.plugin.hive.HiveMetadata.createTable(HiveMetadata.java:934)
at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorMetadata.createTable(ClassLoaderSafeConnectorMetadata.java:383)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@findinpath are you asking me to add a test that verifies the error when null
property values are used? Or to do better error checking? Just want to be sure I understand the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's first check whether setting NULL
is valid. Check whether hive does accept setting properties NULL
values.
In any case, the error thrown to the user should not be NPE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hive will accept null
as a string literal:
: jdbc:hive2://localhost:10000/default> create table t2(c1 int) tblproperties('extra.property'='null');
0: jdbc:hive2://localhost:10000/default>
The properties are then:
: jdbc:hive2://localhost:10000/default> show create table t2;
+----------------------------------------------------+
| createtab_stmt |
+----------------------------------------------------+
| CREATE TABLE `t2`( |
| `c1` int) |
| ROW FORMAT SERDE |
| 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' |
| STORED AS INPUTFORMAT |
| 'org.apache.hadoop.mapred.TextInputFormat' |
| OUTPUTFORMAT |
| 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' |
| LOCATION |
| 'hdfs://hadoop-master:9000/user/hive/warehouse/t2' |
| TBLPROPERTIES ( |
| 'bucketing_version'='2', |
| 'extra.property'='null', |
| 'transient_lastDdlTime'='1677618343') |
+----------------------------------------------------+
14 rows selected (0.065 seconds)
0: jdbc:hive2://localhost:10000/default>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just checked also on my own
0: jdbc:hive2://localhost:10000/default> create table t2(c1 int) tblproperties('extra.property'=null);
Error: Error while compiling statement: FAILED: ParseException line 1:55 mismatched input 'null' expecting StringLiteral near '=' in specifying key/value property (state=42000,code=40000)
Specifying NULL
can't be used in Hive as well as a table property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of above null
is represented as a String
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought @findinpath was testing with null
as a string for an extra property which is what I tested but looks like he was not.
What changes do you want to see in the PR based on this discussion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to add a check if the property values are null, if null then we need to throw an new TrinoException(INVALID_TABLE_PROPERTY...
@Praveen2112 even if we specify some "extra" properties in the We need to cover probably the need to set arbitrary properties when the table is being created. I don't think though that we need to showcase the "extra" properties in the |
f5920d2
to
2ff3b99
Compare
Obsoleted by #17172 |
Fixes #954