Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement - Backup recovery with missing chunks #157

Closed
naveed-patel opened this issue Sep 4, 2017 · 6 comments · Fixed by #595
Closed

Enhancement - Backup recovery with missing chunks #157

naveed-patel opened this issue Sep 4, 2017 · 6 comments · Fixed by #595

Comments

@naveed-patel
Copy link

As the backup grows huge, the number of chunks are bound to increase and can run into 100 thousands. A few (or hundred) missing chunks shouldn't render the backup useless. Please allow recovering from backups with missing chunks.

I believe the working is this way

  1. The initial chunk to be read is stored somewhere in the snapshot file
  2. Each chunk has a pointer to the next chunk

A restore should restore all files which it can restore fully and Warn that there can 1 or many files that couldn't be restored due to missing pieces.

Similarly, it shouldn't prevent from taking additional backups. It should warn that there are missing chunks and hence a full scan should be done (-hash). The chunks that are having parts of corrupted files should be put into fossil and new parts should be created.

Hope you give this a thought and implement it in a wonderful way.

@jtackaberry
Copy link
Contributor

I was just testing the failure mode of the storage losing a chunk (by manually deleting one) and was surprised to see no files were recoverable after the point that Duplicacy ran into a missing chunk. I expected any files with contents in the chunk would be unavailable (or corrupt) but that other files would still be restorable.

https://github.com/gilbertchen/duplicacy/wiki/Missing-Chunks describes dealing with the problem of missing chunks by removing all snapshots that reference them from storage.

But consider the case where you do your initial large backup, and then your deltas over time are relatively small. You might have a year's worth of snapshots, and if I understand things correctly, if you lose a chunk from that first snapshot that's referenced in all subsequent snapshots, you've just lost everything.

Have I understood that correctly? If so, does anyone know of any cloud storage providers advertising 100% durability? :)

@geek-merlin
Copy link

So without this we have a serious risk of big loss. THen it is critical for any serious use.

@mr-flibble
Copy link

I was just testing the failure mode of the storage losing a chunk (by manually deleting one) and was surprised to see no files were recoverable after the point that Duplicacy ran into a missing chunk.

@jtackaberry That's strange. I was testing this couple days ago and I have different result.
I added new big file to backup, do backup and delete some chunks. After that do another backup - I get no warnings.

After I did restore of that new big file (to whom I deleted chunks) - and restore was performed without error! But restored file was smaller ...smaller and corrupted.

Conclusion for my test in GUI version 2.2.1

  • There is no warning for missing chunks when backuping - so user don't know that future backup will be useless

  • There is no checking if restored file is corrupted/different size

@gilbertchen
Copy link
Owner

There are two types of chunks, file chunks and metadata chunks. If the missing chunk is a file chunk, then it only prevents the restoration of affected file(s). If the missing chunk is a metadata chunk, then the entire affected snapshot can't be restored.

Duplicacy assumes that the storage is reliable, so it doesn't implement any error correction or backup repairing (which will only add more complexity and may potentially lead to more bugs). Instead, it is recommended that you use the copy command to make multiple copies on different storages (a unique feature of Duplicacy).

There is no warning for missing chunks when backuping - so user don't know that future backup will be useless

The current GUI was designed be a simple wrapper that does backup and prune, so you'll need to run the check command using the CLI version if the storage isn't reliable.

There is no checking if restored file is corrupted/different size

This is a bug in the current GUI where the error from the CLI isn't parsed correctly. The upcoming new web-based GUI should not have this issue.

@mr-flibble
Copy link

mr-flibble commented Oct 27, 2018

@gilbertchen Thanks for reply. I tested same scenario with CLI restore and I get error message about chunk can't be fund - so that's good.
Copy command sounds like very good option for Duplicacy in current reliability situation. I hope that new GUI will support copy command.

EDIT: I find out that GUI restore will restore file even with missing chunks - that's great :)

@geek-merlin
Copy link

Thought and researched a bit about this. Having a second copy means 100% more data, and when there's an inconsistency, you still do not know which is the correct one. Maths can do better via Reed-Solomon codes: You can adjust how many % more storage you chip in and get some % of bis which can be autocorrected. No wonder cloud storage providers use it.

FYI, there is a go package port of the backblaze-reedsolomon implementation: https://github.com/klauspost/reedsolomon

I'd really really love to have the storage layer robustified with this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants