Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add extra_properties to hive table properties #9475

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@
import io.trino.sql.tree.Explain;
import io.trino.sql.tree.ExplainAnalyze;
import io.trino.sql.tree.Expression;
import io.trino.sql.tree.FunctionCall;
import io.trino.sql.tree.Identifier;
import io.trino.sql.tree.LikePredicate;
import io.trino.sql.tree.LongLiteral;
Expand All @@ -75,6 +76,7 @@
import io.trino.sql.tree.QualifiedName;
import io.trino.sql.tree.Query;
import io.trino.sql.tree.Relation;
import io.trino.sql.tree.Row;
import io.trino.sql.tree.ShowCatalogs;
import io.trino.sql.tree.ShowColumns;
import io.trino.sql.tree.ShowCreate;
Expand Down Expand Up @@ -154,6 +156,7 @@
import static io.trino.sql.tree.ShowCreate.Type.TABLE;
import static io.trino.sql.tree.ShowCreate.Type.VIEW;
import static java.lang.String.format;
import static java.util.Arrays.asList;
import static java.util.Locale.ENGLISH;
import static java.util.Objects.requireNonNull;
import static java.util.stream.Collectors.toList;
Expand Down Expand Up @@ -563,6 +566,13 @@ private static Expression toExpression(Object value)
.collect(toList()));
}

if (value instanceof Map) {
Map<?, ?> map = (Map<?, ?>) value;
return new FunctionCall(QualifiedName.of("map_from_entries"), ImmutableList.of(new Array(map.entrySet().stream()
.map(entry -> new Row(asList(toExpression(entry.getKey()), toExpression(entry.getValue()))))
.collect(toImmutableList()))));
}

throw new TrinoException(INVALID_TABLE_PROPERTY, format("Failed to convert object of type %s to expression: %s", value.getClass().getName(), value));
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,7 @@
import static com.google.common.collect.ImmutableSet.toImmutableSet;
import static com.google.common.collect.Iterables.concat;
import static com.google.common.collect.Iterables.getOnlyElement;
import static io.airlift.json.JsonCodec.mapJsonCodec;
import static io.trino.hdfs.ConfigurationUtils.toJobConf;
import static io.trino.plugin.hive.HiveAnalyzeProperties.getColumnNames;
import static io.trino.plugin.hive.HiveAnalyzeProperties.getPartitionList;
Expand Down Expand Up @@ -213,6 +214,7 @@
import static io.trino.plugin.hive.HiveTableProperties.CSV_QUOTE;
import static io.trino.plugin.hive.HiveTableProperties.CSV_SEPARATOR;
import static io.trino.plugin.hive.HiveTableProperties.EXTERNAL_LOCATION_PROPERTY;
import static io.trino.plugin.hive.HiveTableProperties.EXTRA_PROPERTIES;
import static io.trino.plugin.hive.HiveTableProperties.NULL_FORMAT_PROPERTY;
import static io.trino.plugin.hive.HiveTableProperties.ORC_BLOOM_FILTER_COLUMNS;
import static io.trino.plugin.hive.HiveTableProperties.ORC_BLOOM_FILTER_FPP;
Expand All @@ -227,6 +229,7 @@
import static io.trino.plugin.hive.HiveTableProperties.getAvroSchemaUrl;
import static io.trino.plugin.hive.HiveTableProperties.getBucketProperty;
import static io.trino.plugin.hive.HiveTableProperties.getExternalLocation;
import static io.trino.plugin.hive.HiveTableProperties.getExtraProperties;
import static io.trino.plugin.hive.HiveTableProperties.getFooterSkipCount;
import static io.trino.plugin.hive.HiveTableProperties.getHeaderSkipCount;
import static io.trino.plugin.hive.HiveTableProperties.getHiveStorageFormat;
Expand Down Expand Up @@ -685,6 +688,11 @@ private ConnectorTableMetadata doGetTableMetadata(ConnectorSession session, Sche
// Partition Projection specific properties
properties.putAll(partitionProjectionService.getPartitionProjectionTrinoTableProperties(table));

String extraProperties = table.getParameters().get(EXTRA_PROPERTIES);
if (extraProperties != null) {
properties.put(EXTRA_PROPERTIES, mapJsonCodec(String.class, String.class).fromJson(extraProperties));
Comment on lines +692 to +693
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if hive/spark or a different system add some additional properties - is it possible for us to capture all of them ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say if we Hive/Spark adds some additional properties - it won't be visible here right ? Since this JsonString - captures only the extra-properties configured via Trino - Is this intentional or can we be a bit open about the properties added by other system ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Yes that is correct. It was intentional as we didn't have a use case for that. Do we want to add that as part of this PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would nice if we could capture those properties also - SHOW CREATE TABLES are kind of queries which allows us to migrate tables from one system to other and we should make sure we capture most of the properties in this case.

}

return new ConnectorTableMetadata(tableName, columns.build(), properties.buildOrThrow(), comment);
}

Expand Down Expand Up @@ -1090,6 +1098,15 @@ else if (avroSchemaLiteral != null) {
tableProperties.put("numFiles", "-1");
tableProperties.put("totalSize", "-1");

// Extra properties
Map<String, String> extraProperties = getExtraProperties(tableMetadata.getProperties());
if (extraProperties != null) {
tableProperties.put(EXTRA_PROPERTIES, mapJsonCodec(String.class, String.class).toJson(extraProperties));
for (Map.Entry<String, String> extraProperty : extraProperties.entrySet()) {
posulliv marked this conversation as resolved.
Show resolved Hide resolved
tableProperties.put(extraProperty.getKey(), extraProperty.getValue());
}
}

// Table comment property
tableMetadata.getComment().ifPresent(value -> tableProperties.put(TABLE_COMMENT, value));

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@
import io.trino.spi.TrinoException;
import io.trino.spi.session.PropertyMetadata;
import io.trino.spi.type.ArrayType;
import io.trino.spi.type.MapType;
import io.trino.spi.type.TypeManager;

import javax.inject.Inject;

Expand Down Expand Up @@ -67,13 +69,15 @@ public class HiveTableProperties
public static final String CSV_ESCAPE = "csv_escape";
public static final String TRANSACTIONAL = "transactional";
public static final String AUTO_PURGE = "auto_purge";
public static final String EXTRA_PROPERTIES = "extra_properties";

private final List<PropertyMetadata<?>> tableProperties;

@Inject
public HiveTableProperties(
HiveConfig config,
OrcWriterConfig orcWriterConfig)
OrcWriterConfig orcWriterConfig,
TypeManager typeManager)
{
tableProperties = ImmutableList.of(
stringProperty(
Expand Down Expand Up @@ -169,7 +173,16 @@ public HiveTableProperties(
PARTITION_PROJECTION_LOCATION_TEMPLATE,
"Partition projection location template",
null,
false));
false),
new PropertyMetadata<>(
EXTRA_PROPERTIES,
"Extra table properties",
new MapType(VARCHAR, VARCHAR, typeManager.getTypeOperators()),
Map.class,
null,
posulliv marked this conversation as resolved.
Show resolved Hide resolved
false,
value -> ((Map<String, String>) value),
value -> value));
}

public List<PropertyMetadata<?>> getTableProperties()
Expand Down Expand Up @@ -297,4 +310,9 @@ public static Optional<Boolean> isAutoPurge(Map<String, Object> tableProperties)
{
return Optional.ofNullable((Boolean) tableProperties.get(AUTO_PURGE));
}

public static Map<String, String> getExtraProperties(Map<String, Object> tableProperties)
{
return (Map<String, String>) tableProperties.get(EXTRA_PROPERTIES);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
import io.trino.metadata.QualifiedObjectName;
import io.trino.metadata.TableHandle;
import io.trino.metadata.TableMetadata;
import io.trino.spi.QueryId;
import io.trino.spi.connector.CatalogSchemaTableName;
import io.trino.spi.connector.ColumnHandle;
import io.trino.spi.connector.ColumnMetadata;
Expand All @@ -54,6 +55,7 @@
import io.trino.testing.MaterializedResult;
import io.trino.testing.MaterializedResultWithQueryId;
import io.trino.testing.MaterializedRow;
import io.trino.testing.QueryFailedException;
import io.trino.testing.QueryRunner;
import io.trino.testing.TestingConnectorBehavior;
import io.trino.testing.sql.TestTable;
Expand Down Expand Up @@ -8453,6 +8455,98 @@ public void testCreateAcidTableUnsupported()
assertQueryFails("CREATE TABLE acid_unsupported WITH (transactional = true) AS SELECT 123 x", "FileHiveMetastore does not support ACID tables");
}

@Test
posulliv marked this conversation as resolved.
Show resolved Hide resolved
public void testExtraProperties()
{
String tableName = format("%s.%s.test_extra_properties", getSession().getCatalog().get(), getSession().getSchema().get());
@Language("SQL") String createTableSql = format("""
CREATE TABLE %s (
c1 integer)
WITH (
extra_properties = MAP(ARRAY['extra.property'], ARRAY['true']),
format = 'ORC'
)""",
tableName);
MaterializedResultWithQueryId result = getDistributedQueryRunner().executeWithQueryId(getSession(), createTableSql);
QueryId queryId = result.getQueryId();
String nodeVersion = (String) computeScalar("SELECT node_version FROM system.runtime.nodes WHERE coordinator");
assertQuery(
"SELECT * FROM \"test_extra_properties$properties\"",
"SELECT 'workaround for potential lack of HIVE-12730', 'false', 'true', '{\n \"extra.property\" : \"true\"\n}', '0', '0', '" + queryId + "', '" + nodeVersion + "', '0', '0', 'false'");
MaterializedResult actualResult = computeActual("SHOW CREATE TABLE " + tableName);
String expectedShowCreateTableSql = "CREATE TABLE hive.tpch.test_extra_properties (\n" +
" c1 integer\n" +
")\n" +
"WITH (\n" +
" extra_properties = map_from_entries(ARRAY[ROW('extra.property', 'true')]),\n" +
" format = 'ORC'\n" +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying out this feature locally and this is what I see:

trino:default> show create table test1;
                                 Create Table                                  
-------------------------------------------------------------------------------
 CREATE TABLE hive.default.test1 (                                             
    c1 integer                                                                 
 )                                                                             
 WITH (                                                                        
    extra_properties = map_from_entries(ARRAY[ROW('extra.property', 'true')]), 
    format = 'ORC'                                                             
 )                                                                             
(1 row)
[metastore]> select * from TABLE_PARAMS where tbl_id = 56;
+--------+--------------------------------+---------------------------------------------+
| TBL_ID | PARAM_KEY                      | PARAM_VALUE                                 |
+--------+--------------------------------+---------------------------------------------+
|     56 | STATS_GENERATED_VIA_STATS_TASK | workaround for potential lack of HIVE-12730 |
|     56 | auto.purge                     | false                                       |
|     56 | extra.property                 | true                                        |
|     56 | extra_properties               | {
  "extra.property" : "true"
}             |
|     56 | numFiles                       | 0                                           |
|     56 | numRows                        | 0                                           |
|     56 | presto_query_id                | 20230227_195136_00008_skeew                 |
|     56 | presto_version                 | 406-258-gf5920d2                            |
|     56 | rawDataSize                    | 0                                           |
|     56 | totalSize                      | 0                                           |
|     56 | transient_lastDdlTime          | 1677527498                                  |
+--------+--------------------------------+---------------------------------------------+

There are a few properties missing in the show create table output.
Is this intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@findinpath are you asking why the additional hive table properties not supported by trino are not included in the extra properties in the show create table output?

If so, yes that was intentional. This PR was to only support extra properties that are specified explicitly by the user when the table is created through trino.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking about the context where users do update from different engines the table properties.
Being limited to a number of "extra" properties defined up-front seems rather limiting to me.

Do note that a similar UX experience will need to be offered for Iceberg, Delta as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@findinpath got it. This PR was intentionally limited to the hive connector and wasn't intended to deal with other query engines.

Do you think that could be done as a follow up?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think that could be done as a follow up?

Definitely.
Let's first settle on how to do it in Hive and implementing the functionality in other connectors will eventually follow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR was to only support extra properties that are specified explicitly by the user when the table is created through trino.

Can we extend this functionality in this PR ? One more additional concern here is that we add an additional properties extra_properties in Hive table which can be overriden by other engines. We could skip some of the properties like STATS_GENERATED_VIA_STATS_TASK, presto_query_id and presto_version

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CREATED_BY and TRINO_VIEW_FLAG

")";
assertEquals(getOnlyElement(actualResult.getOnlyColumnAsSet()), expectedShowCreateTableSql);
assertUpdate("DROP TABLE " + tableName);
}

@Test
public void testMultipleExtraProperties()
{
String tableName = format("%s.%s.test_multiple_extra_properties", getSession().getCatalog().get(), getSession().getSchema().get());
@Language("SQL") String createTableSql = format("""
CREATE TABLE %s (
c1 integer)
WITH (
extra_properties = MAP(ARRAY['extra.property.one', 'extra.property.two'], ARRAY['one', 'two']),
format = 'ORC'
)""",
tableName);
MaterializedResultWithQueryId result = getDistributedQueryRunner().executeWithQueryId(getSession(), createTableSql);
QueryId queryId = result.getQueryId();
String nodeVersion = (String) computeScalar("SELECT node_version FROM system.runtime.nodes WHERE coordinator");
assertQuery(
"SELECT * FROM \"test_multiple_extra_properties$properties\"",
"SELECT 'workaround for potential lack of HIVE-12730', 'false', 'one', 'two', '{\n \"extra.property.one\" : \"one\",\n \"extra.property.two\" : \"two\"\n}', '0', '0', '" + queryId + "', '" + nodeVersion + "', '0', '0', 'false'");
MaterializedResult actualResult = computeActual("SHOW CREATE TABLE " + tableName);
String expectedShowCreateTableSql = "CREATE TABLE hive.tpch.test_multiple_extra_properties (\n" +
" c1 integer\n" +
")\n" +
"WITH (\n" +
" extra_properties = map_from_entries(ARRAY[ROW('extra.property.one', 'one'),ROW('extra.property.two', 'two')]),\n" +
" format = 'ORC'\n" +
")";
posulliv marked this conversation as resolved.
Show resolved Hide resolved
assertEquals(getOnlyElement(actualResult.getOnlyColumnAsSet()), expectedShowCreateTableSql);
assertUpdate("DROP TABLE " + tableName);
}

@Test
public void testDuplicateExtraProperties()
{
String tableName = format("%s.%s.test_duplicate_extra_properties", getSession().getCatalog().get(), getSession().getSchema().get());
@Language("SQL") String createTableSql = format("""
CREATE TABLE %s (
c1 integer)
WITH (
extra_properties = MAP(ARRAY['extra.property', 'extra.property'], ARRAY['true', 'false']),
format = 'ORC'
)""",
tableName);
assertQueryFails(createTableSql, "Invalid value for catalog 'hive' table property 'extra_properties': Cannot convert.*");
}

@Test
public void testOverwriteExistingPropertyWithExtraProperties()
{
String tableName = format("%s.%s.test_overwrite_extra_properties", getSession().getCatalog().get(), getSession().getSchema().get());
@Language("SQL") String createTableSql = format("""
CREATE TABLE %s (
c1 integer)
WITH (
extra_properties = MAP(ARRAY['transactional'], ARRAY['true']),
format = 'ORC'
)""",
tableName);
assertThatThrownBy(() -> assertUpdate(createTableSql))
.isInstanceOf(QueryFailedException.class)
.hasMessage("Multiple entries with same key: transactional=true and transactional=false");
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NULL property values cause failures.

Query 20230227_200050_00010_skeew failed: null value in entry: extra.property=null
java.lang.NullPointerException: null value in entry: extra.property=null
	at com.google.common.collect.CollectPreconditions.checkEntryNotNull(CollectPreconditions.java:33)
	at com.google.common.collect.ImmutableMapEntry.<init>(ImmutableMapEntry.java:54)
	at com.google.common.collect.ImmutableMap.entryOf(ImmutableMap.java:339)
	at com.google.common.collect.ImmutableMap$Builder.put(ImmutableMap.java:449)
	at io.trino.plugin.hive.HiveMetadata.getEmptyTableProperties(HiveMetadata.java:1106)
	at io.trino.plugin.hive.HiveMetadata.createTable(HiveMetadata.java:934)
	at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorMetadata.createTable(ClassLoaderSafeConnectorMetadata.java:383)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@findinpath are you asking me to add a test that verifies the error when null property values are used? Or to do better error checking? Just want to be sure I understand the comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's first check whether setting NULL is valid. Check whether hive does accept setting properties NULL values.
In any case, the error thrown to the user should not be NPE.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hive will accept null as a string literal:

: jdbc:hive2://localhost:10000/default> create table t2(c1 int) tblproperties('extra.property'='null');
0: jdbc:hive2://localhost:10000/default>

The properties are then:

: jdbc:hive2://localhost:10000/default> show create table t2;
+----------------------------------------------------+
|                   createtab_stmt                   |
+----------------------------------------------------+
| CREATE TABLE `t2`(                                 |
|   `c1` int)                                        |
| ROW FORMAT SERDE                                   |
|   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'  |
| STORED AS INPUTFORMAT                              |
|   'org.apache.hadoop.mapred.TextInputFormat'       |
| OUTPUTFORMAT                                       |
|   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' |
| LOCATION                                           |
|   'hdfs://hadoop-master:9000/user/hive/warehouse/t2' |
| TBLPROPERTIES (                                    |
|   'bucketing_version'='2',                         |
|   'extra.property'='null',                         |
|   'transient_lastDdlTime'='1677618343')            |
+----------------------------------------------------+
14 rows selected (0.065 seconds)
0: jdbc:hive2://localhost:10000/default>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked also on my own

0: jdbc:hive2://localhost:10000/default> create table t2(c1 int) tblproperties('extra.property'=null);
Error: Error while compiling statement: FAILED: ParseException line 1:55 mismatched input 'null' expecting StringLiteral near '=' in specifying key/value property (state=42000,code=40000)

Specifying NULL can't be used in Hive as well as a table property.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of above null is represented as a String.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought @findinpath was testing with null as a string for an extra property which is what I tested but looks like he was not.

What changes do you want to see in the PR based on this discussion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add a check if the property values are null, if null then we need to throw an new TrinoException(INVALID_TABLE_PROPERTY...

private static final Set<HiveStorageFormat> NAMED_COLUMN_ONLY_FORMATS = ImmutableSet.of(HiveStorageFormat.AVRO, HiveStorageFormat.JSON);

@DataProvider
Expand Down