-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add auto_purge to hive table properties #9457
Add auto_purge to hive table properties #9457
Conversation
59cebf5
to
ca17dec
Compare
@@ -606,6 +609,11 @@ private ConnectorTableMetadata doGetTableMetadata(ConnectorSession session, Sche | |||
|
|||
Optional<String> comment = Optional.ofNullable(table.getParameters().get(TABLE_COMMENT)); | |||
|
|||
String autoPurgeProperty = table.getParameters().get(AUTO_PURGE_KEY); | |||
if (parseBoolean(autoPurgeProperty)) { | |||
properties.put(AUTO_PURGE, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will make the property stored and available to metastore.
What if it's us who delete the data?
if user can set auto_purge
, they would expect us to obey it.
we can delete data eg in glue metastore flow, so next question -- does this property make sense when metastore is Glue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@findepi from my understanding of Glue, this property does not make sense when the metastore is Glue.
The DeleteTable request states the data for the table will be deleted asynchronously which means the auto_purge
property would have no effect if it was set to false
i.e. data will always be purged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This flag tells HMS to skip trash when deleting data. First, trash only applies to HDFS, not S3, so it is unlikely to be a concern for Glue. Second, the Glue metastore does its own deletion and never uses trash, so we effectively always purge.
For reference, I dug through the Hive code and found that deletion is done using org.apache.hadoop.hive.metastore.utils.FileUtils#moveToTrash
which calls org.apache.hadoop.fs.Trash#moveToAppropriateTrash
. In other words, trash is a specific thing you have to implement, and we don't use it in Trino.
per #955 (comment), we should probably do #954 instead, especially if we don't apply any meaning to the property on the Trino side. |
i do think it would be better, but you may want to check with @electrum first. |
Thanks! |
Resolves #955