-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add extra_properties to hive table properties #9475
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -29,6 +29,7 @@ | |
import io.trino.metadata.QualifiedObjectName; | ||
import io.trino.metadata.TableHandle; | ||
import io.trino.metadata.TableMetadata; | ||
import io.trino.spi.QueryId; | ||
import io.trino.spi.connector.CatalogSchemaTableName; | ||
import io.trino.spi.connector.ColumnHandle; | ||
import io.trino.spi.connector.ColumnMetadata; | ||
|
@@ -54,6 +55,7 @@ | |
import io.trino.testing.MaterializedResult; | ||
import io.trino.testing.MaterializedResultWithQueryId; | ||
import io.trino.testing.MaterializedRow; | ||
import io.trino.testing.QueryFailedException; | ||
import io.trino.testing.QueryRunner; | ||
import io.trino.testing.TestingConnectorBehavior; | ||
import io.trino.testing.sql.TestTable; | ||
|
@@ -8453,6 +8455,98 @@ public void testCreateAcidTableUnsupported() | |
assertQueryFails("CREATE TABLE acid_unsupported WITH (transactional = true) AS SELECT 123 x", "FileHiveMetastore does not support ACID tables"); | ||
} | ||
|
||
@Test | ||
posulliv marked this conversation as resolved.
Show resolved
Hide resolved
|
||
public void testExtraProperties() | ||
{ | ||
String tableName = format("%s.%s.test_extra_properties", getSession().getCatalog().get(), getSession().getSchema().get()); | ||
@Language("SQL") String createTableSql = format(""" | ||
CREATE TABLE %s ( | ||
c1 integer) | ||
WITH ( | ||
extra_properties = MAP(ARRAY['extra.property'], ARRAY['true']), | ||
format = 'ORC' | ||
)""", | ||
tableName); | ||
MaterializedResultWithQueryId result = getDistributedQueryRunner().executeWithQueryId(getSession(), createTableSql); | ||
QueryId queryId = result.getQueryId(); | ||
String nodeVersion = (String) computeScalar("SELECT node_version FROM system.runtime.nodes WHERE coordinator"); | ||
assertQuery( | ||
"SELECT * FROM \"test_extra_properties$properties\"", | ||
"SELECT 'workaround for potential lack of HIVE-12730', 'false', 'true', '{\n \"extra.property\" : \"true\"\n}', '0', '0', '" + queryId + "', '" + nodeVersion + "', '0', '0', 'false'"); | ||
MaterializedResult actualResult = computeActual("SHOW CREATE TABLE " + tableName); | ||
String expectedShowCreateTableSql = "CREATE TABLE hive.tpch.test_extra_properties (\n" + | ||
" c1 integer\n" + | ||
")\n" + | ||
"WITH (\n" + | ||
" extra_properties = map_from_entries(ARRAY[ROW('extra.property', 'true')]),\n" + | ||
" format = 'ORC'\n" + | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm trying out this feature locally and this is what I see:
There are a few properties missing in the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @findinpath are you asking why the additional hive table properties not supported by trino are not included in the extra properties in the If so, yes that was intentional. This PR was to only support extra properties that are specified explicitly by the user when the table is created through trino. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm thinking about the context where users do update from different engines the table properties. Do note that a similar UX experience will need to be offered for Iceberg, Delta as well. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @findinpath got it. This PR was intentionally limited to the hive connector and wasn't intended to deal with other query engines. Do you think that could be done as a follow up? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Definitely. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Can we extend this functionality in this PR ? One more additional concern here is that we add an additional properties There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
")"; | ||
assertEquals(getOnlyElement(actualResult.getOnlyColumnAsSet()), expectedShowCreateTableSql); | ||
assertUpdate("DROP TABLE " + tableName); | ||
} | ||
|
||
@Test | ||
public void testMultipleExtraProperties() | ||
{ | ||
String tableName = format("%s.%s.test_multiple_extra_properties", getSession().getCatalog().get(), getSession().getSchema().get()); | ||
@Language("SQL") String createTableSql = format(""" | ||
CREATE TABLE %s ( | ||
c1 integer) | ||
WITH ( | ||
extra_properties = MAP(ARRAY['extra.property.one', 'extra.property.two'], ARRAY['one', 'two']), | ||
format = 'ORC' | ||
)""", | ||
tableName); | ||
MaterializedResultWithQueryId result = getDistributedQueryRunner().executeWithQueryId(getSession(), createTableSql); | ||
QueryId queryId = result.getQueryId(); | ||
String nodeVersion = (String) computeScalar("SELECT node_version FROM system.runtime.nodes WHERE coordinator"); | ||
assertQuery( | ||
"SELECT * FROM \"test_multiple_extra_properties$properties\"", | ||
"SELECT 'workaround for potential lack of HIVE-12730', 'false', 'one', 'two', '{\n \"extra.property.one\" : \"one\",\n \"extra.property.two\" : \"two\"\n}', '0', '0', '" + queryId + "', '" + nodeVersion + "', '0', '0', 'false'"); | ||
MaterializedResult actualResult = computeActual("SHOW CREATE TABLE " + tableName); | ||
String expectedShowCreateTableSql = "CREATE TABLE hive.tpch.test_multiple_extra_properties (\n" + | ||
" c1 integer\n" + | ||
")\n" + | ||
"WITH (\n" + | ||
" extra_properties = map_from_entries(ARRAY[ROW('extra.property.one', 'one'),ROW('extra.property.two', 'two')]),\n" + | ||
" format = 'ORC'\n" + | ||
")"; | ||
posulliv marked this conversation as resolved.
Show resolved
Hide resolved
|
||
assertEquals(getOnlyElement(actualResult.getOnlyColumnAsSet()), expectedShowCreateTableSql); | ||
assertUpdate("DROP TABLE " + tableName); | ||
} | ||
|
||
@Test | ||
public void testDuplicateExtraProperties() | ||
{ | ||
String tableName = format("%s.%s.test_duplicate_extra_properties", getSession().getCatalog().get(), getSession().getSchema().get()); | ||
@Language("SQL") String createTableSql = format(""" | ||
CREATE TABLE %s ( | ||
c1 integer) | ||
WITH ( | ||
extra_properties = MAP(ARRAY['extra.property', 'extra.property'], ARRAY['true', 'false']), | ||
format = 'ORC' | ||
)""", | ||
tableName); | ||
assertQueryFails(createTableSql, "Invalid value for catalog 'hive' table property 'extra_properties': Cannot convert.*"); | ||
} | ||
|
||
@Test | ||
public void testOverwriteExistingPropertyWithExtraProperties() | ||
{ | ||
String tableName = format("%s.%s.test_overwrite_extra_properties", getSession().getCatalog().get(), getSession().getSchema().get()); | ||
@Language("SQL") String createTableSql = format(""" | ||
CREATE TABLE %s ( | ||
c1 integer) | ||
WITH ( | ||
extra_properties = MAP(ARRAY['transactional'], ARRAY['true']), | ||
format = 'ORC' | ||
)""", | ||
tableName); | ||
assertThatThrownBy(() -> assertUpdate(createTableSql)) | ||
.isInstanceOf(QueryFailedException.class) | ||
.hasMessage("Multiple entries with same key: transactional=true and transactional=false"); | ||
} | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @findinpath are you asking me to add a test that verifies the error when There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's first check whether setting There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hive will accept
The properties are then:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I just checked also on my own
Specifying There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In case of above There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought @findinpath was testing with What changes do you want to see in the PR based on this discussion? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We need to add a check if the property values are null, if null then we need to throw an |
||
private static final Set<HiveStorageFormat> NAMED_COLUMN_ONLY_FORMATS = ImmutableSet.of(HiveStorageFormat.AVRO, HiveStorageFormat.JSON); | ||
|
||
@DataProvider | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if hive/spark or a different system add some additional properties - is it possible for us to capture all of them ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Say if we Hive/Spark adds some additional properties - it won't be visible here right ? Since this JsonString - captures only the extra-properties configured via Trino - Is this intentional or can we be a bit open about the properties added by other system ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Yes that is correct. It was intentional as we didn't have a use case for that. Do we want to add that as part of this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would nice if we could capture those properties also -
SHOW CREATE TABLES
are kind of queries which allows us to migrate tables from one system to other and we should make sure we capture most of the properties in this case.