-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicate file path found in the Iceberg metadata snapshot #295
Comments
Error in full:
Can confirm that the tabular connector is periodically writing out duplicate filepaths in the snapshots. I used the current manifest file and found the snapshot ID referenced in the error. This snapshot ID pointed to an avro file in it's "manifest-list" key. I opened that file and found 4 objects pointing to different metadata avro files. I opened the first one which had 4 objects, 2 sets of duplicates. One of the pairs pointed to the parquet file that was referenced in the error. Tabular connector had written duplicate filepaths. With snapshot retention set to a minimum of 1 day, that means whenever this happens my iceberg table will not be queryable for 24 hours. This is a problem. |
We've seen this, too. Not on a regular basis, but we've occasionally discovered duplicate data-file references in a number of tables. Relatedly, we've recently discovered missing data files in a number of tables, i.e. data files listed in manifests that have been deleted from S3. One theory is that this is the result of running compaction (with |
Seems related to #212. |
We have our connectors running and sinking data to our Iceberg catalog in Glue/S3. However when trying to surface the data in Snowflake a few of these iceberg tables ran into this error from Snowflake.
Duplicate file path found in the Iceberg metadata snapshot. Please check that your Iceberg metadata generation is producing valid manifest files and refresh to a newer snapshot once fixed.
We are still trying to sort out where/why this is happening by combing through the manifest and snapshot files.
But looks like the tabular connector has created some invalid duplicates within the snapshot files.
The text was updated successfully, but these errors were encountered: