Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add auto_purge to hive table properties #9457

Merged
merged 2 commits into from
Oct 13, 2021

Conversation

posulliv
Copy link
Contributor

@posulliv posulliv commented Oct 1, 2021

Resolves #955

@cla-bot cla-bot bot added the cla-signed label Oct 1, 2021
@posulliv posulliv force-pushed the hive-auto-purge-tbl-property branch from 59cebf5 to ca17dec Compare October 1, 2021 14:30
@@ -606,6 +609,11 @@ private ConnectorTableMetadata doGetTableMetadata(ConnectorSession session, Sche

Optional<String> comment = Optional.ofNullable(table.getParameters().get(TABLE_COMMENT));

String autoPurgeProperty = table.getParameters().get(AUTO_PURGE_KEY);
if (parseBoolean(autoPurgeProperty)) {
properties.put(AUTO_PURGE, true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will make the property stored and available to metastore.

What if it's us who delete the data?
if user can set auto_purge, they would expect us to obey it.

we can delete data eg in glue metastore flow, so next question -- does this property make sense when metastore is Glue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@findepi from my understanding of Glue, this property does not make sense when the metastore is Glue.

The DeleteTable request states the data for the table will be deleted asynchronously which means the auto_purge property would have no effect if it was set to false i.e. data will always be purged.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This flag tells HMS to skip trash when deleting data. First, trash only applies to HDFS, not S3, so it is unlikely to be a concern for Glue. Second, the Glue metastore does its own deletion and never uses trash, so we effectively always purge.

For reference, I dug through the Hive code and found that deletion is done using org.apache.hadoop.hive.metastore.utils.FileUtils#moveToTrash which calls org.apache.hadoop.fs.Trash#moveToAppropriateTrash. In other words, trash is a specific thing you have to implement, and we don't use it in Trino.

@findepi
Copy link
Member

findepi commented Oct 1, 2021

per #955 (comment), we should probably do #954 instead, especially if we don't apply any meaning to the property on the Trino side.

@posulliv
Copy link
Contributor Author

posulliv commented Oct 1, 2021

@findepi per your comment, I made an attempt to add extra_properties to cover this here - #9475

If you feel that is the better approach for this scenario, I will close this PR and focus instead on the extra_properties PR.

@findepi
Copy link
Member

findepi commented Oct 1, 2021

i do think it would be better, but you may want to check with @electrum first.

@electrum electrum merged commit 4949bdc into trinodb:master Oct 13, 2021
@electrum
Copy link
Member

Thanks!

@github-actions github-actions bot added this to the 364 milestone Oct 13, 2021
@posulliv posulliv deleted the hive-auto-purge-tbl-property branch February 9, 2022 01:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Add support for Hive auto.purge property
3 participants