Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Capture all object metadata in tar stream #68

Open
anelson opened this issue Jan 9, 2023 · 2 comments
Open

Capture all object metadata in tar stream #68

anelson opened this issue Jan 9, 2023 · 2 comments

Comments

@anelson
Copy link
Collaborator

anelson commented Jan 9, 2023

Currently we capture only the object's key, and its data. But to be useful for a general-purpose backup tool, we need to preserve object metadata as well. Including, but not limited to:

  • tags
  • access control lists (ACLs)
  • user-defined metadata
  • original creation date
  • version ID

When restoring, the default behavior should be to restore all possible metadata (unfortunately it's not possible to preserve the original create date or the version ID), with an option to restore only specific metadata components instead.

This should be doable by storing our custom metadata in the tar archive in a separate "file" which appears in the archive before the actual object. We can make this file hidden and append a suffix like .$$metadata or something to ensure it's not confused for a real object. The extract stage would need to be modified to handle this, but that complexity would be hidden completely from the public API.

@anelson anelson added duplicate This issue or pull request already exists enhancement New feature or request good first issue Good for newcomers project/data-plane critical project/red-stack groomed needs-clarification and removed critical duplicate This issue or pull request already exists good first issue Good for newcomers groomed needs-clarification enhancement New feature or request project/red-stack labels Jan 9, 2023
@anelson
Copy link
Collaborator Author

anelson commented Jan 10, 2023

@kostiantyn-povnych could this metadata be stored in the ScaleZ index, as an alternative or in addition to storing it here?

@kostiantyn-povnych
Copy link

kostiantyn-povnych commented Jan 11, 2023

Yes, it could and, in my opinion, should be stored in the ScaleZ index.
Moreover, some of the fields you listed are already supported:

  • version id is the internal key for object versions in version S3 bucket metadata which is transformed into FS index
  • original creation date: captured and stored for all backup types
  • user-defined metadata: Not yet supported, and I don't see how the user could inject some custom metadata for individual files in File backup.
  • tags: Not done yet but can be easily implemented by extension of the FS metadata definitions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants