-
Notifications
You must be signed in to change notification settings - Fork 340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhancement - Backup recovery with missing chunks #157
Comments
I was just testing the failure mode of the storage losing a chunk (by manually deleting one) and was surprised to see no files were recoverable after the point that Duplicacy ran into a missing chunk. I expected any files with contents in the chunk would be unavailable (or corrupt) but that other files would still be restorable. https://github.com/gilbertchen/duplicacy/wiki/Missing-Chunks describes dealing with the problem of missing chunks by removing all snapshots that reference them from storage. But consider the case where you do your initial large backup, and then your deltas over time are relatively small. You might have a year's worth of snapshots, and if I understand things correctly, if you lose a chunk from that first snapshot that's referenced in all subsequent snapshots, you've just lost everything. Have I understood that correctly? If so, does anyone know of any cloud storage providers advertising 100% durability? :) |
So without this we have a serious risk of big loss. THen it is critical for any serious use. |
@jtackaberry That's strange. I was testing this couple days ago and I have different result. After I did restore of that new big file (to whom I deleted chunks) - and restore was performed without error! But restored file was smaller ...smaller and corrupted. Conclusion for my test in GUI version 2.2.1
|
There are two types of chunks, file chunks and metadata chunks. If the missing chunk is a file chunk, then it only prevents the restoration of affected file(s). If the missing chunk is a metadata chunk, then the entire affected snapshot can't be restored. Duplicacy assumes that the storage is reliable, so it doesn't implement any error correction or backup repairing (which will only add more complexity and may potentially lead to more bugs). Instead, it is recommended that you use the copy command to make multiple copies on different storages (a unique feature of Duplicacy).
The current GUI was designed be a simple wrapper that does backup and prune, so you'll need to run the check command using the CLI version if the storage isn't reliable.
This is a bug in the current GUI where the error from the CLI isn't parsed correctly. The upcoming new web-based GUI should not have this issue. |
@gilbertchen Thanks for reply. I tested same scenario with CLI restore and I get error message about chunk can't be fund - so that's good. EDIT: I find out that GUI restore will restore file even with missing chunks - that's great :) |
Thought and researched a bit about this. Having a second copy means 100% more data, and when there's an inconsistency, you still do not know which is the correct one. Maths can do better via Reed-Solomon codes: You can adjust how many % more storage you chip in and get some % of bis which can be autocorrected. No wonder cloud storage providers use it. FYI, there is a go package port of the backblaze-reedsolomon implementation: https://github.com/klauspost/reedsolomon I'd really really love to have the storage layer robustified with this. |
As the backup grows huge, the number of chunks are bound to increase and can run into 100 thousands. A few (or hundred) missing chunks shouldn't render the backup useless. Please allow recovering from backups with missing chunks.
I believe the working is this way
A restore should restore all files which it can restore fully and Warn that there can 1 or many files that couldn't be restored due to missing pieces.
Similarly, it shouldn't prevent from taking additional backups. It should warn that there are missing chunks and hence a full scan should be done (-hash). The chunks that are having parts of corrupted files should be put into fossil and new parts should be created.
Hope you give this a thought and implement it in a wonderful way.
The text was updated successfully, but these errors were encountered: