Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

borg2: which commands shall do a complete rebuild of the ChunkIndex? #8476

Open
ThomasWaldmann opened this issue Oct 15, 2024 · 3 comments
Open

Comments

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Oct 15, 2024

For high-latency stores (sftp: and a lot of what's available via rclone:) a full index rebuild needs to list all objects in the store and there are potentially a lot of objects.

For sftp, this also needs listing all 65536+256 nesting directories and sftp is relatively slow.

IIRC, currently these do a complete rebuild:

  • compact (because it removes unused/unreferenced objects)
  • check (repository part, before doing anything else - just to make sure)
  • first operation after a borg check --repair (because repair might have removed objects and thus invalidated the chunks cache)

Most commands will:

  • use their locally cached chunks index (if the hash is still the same of what's in repo/cache/chunks_hash)
  • fetch a fresh index from repo/cache/chunks
  • rebuild the chunks index the slow way, by listing all objects (then write it to repo/cache/chunks)

So the question is:

When shall we rely on an existing cached ChunkIndex (repo/cache/chunks) being in a good state and when shall we rather go the slow-safe route and build a fresh one?

For borg create:

  • it's not a big problem if the index does not have all objects that exist in the repo. If that happens, borg create will just store something to the repo that's already there. After it is finished creating the archive, it will store an updated index to repo/cache/chunks.
  • it would be a severe problem though if the index would falsely say "we have that object" and borg would not store it to the repo. the archive would then reference a non-existing object.
@ThomasWaldmann ThomasWaldmann added this to the 2.0.0b13 milestone Oct 15, 2024
@ThomasWaldmann ThomasWaldmann changed the title borg2: which commands shall rebuild a completely new ChunkIndex? borg2: which commands shall do a complete rebuild of the ChunkIndex? Oct 15, 2024
@ThomasWaldmann ThomasWaldmann modified the milestones: 2.0.0b13, 2.0.0b14 Oct 26, 2024
@ThomasWaldmann
Copy link
Member Author

Note:

borg compact emits some stats and for that it needs the stored sizes of the repository objects. we do not have these in the chunks index usually, so listing all objects and sizes is required for that.

@ThomasWaldmann ThomasWaldmann modified the milestones: 2.0.0b14, 2.0.0b15 Nov 16, 2024
@ThomasWaldmann
Copy link
Member Author

Hmm, borg check could just try to work with the existing chunks index and not rebuild a new one.

If borg check then runs into problems, borg check --repair would rebuild the index.

@ThomasWaldmann
Copy link
Member Author

ThomasWaldmann commented Nov 23, 2024

#8561 - borg compact only needs to rebuild the chunks index IF --stats is given (if repo space usage stats before/after compaction are a must-have).

Without --stats it might be much faster now as it will re-use an existing cached chunks index from the repo and thus does not have to slowly list all repo objects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant