Enhancement - Backup recovery with missing chunks #157

naveed-patel · 2017-09-04T18:15:46Z

As the backup grows huge, the number of chunks are bound to increase and can run into 100 thousands. A few (or hundred) missing chunks shouldn't render the backup useless. Please allow recovering from backups with missing chunks.

I believe the working is this way

The initial chunk to be read is stored somewhere in the snapshot file
Each chunk has a pointer to the next chunk

A restore should restore all files which it can restore fully and Warn that there can 1 or many files that couldn't be restored due to missing pieces.

Similarly, it shouldn't prevent from taking additional backups. It should warn that there are missing chunks and hence a full scan should be done (-hash). The chunks that are having parts of corrupted files should be put into fossil and new parts should be created.

Hope you give this a thought and implement it in a wonderful way.

jtackaberry · 2018-07-10T22:05:02Z

I was just testing the failure mode of the storage losing a chunk (by manually deleting one) and was surprised to see no files were recoverable after the point that Duplicacy ran into a missing chunk. I expected any files with contents in the chunk would be unavailable (or corrupt) but that other files would still be restorable.

https://github.com/gilbertchen/duplicacy/wiki/Missing-Chunks describes dealing with the problem of missing chunks by removing all snapshots that reference them from storage.

But consider the case where you do your initial large backup, and then your deltas over time are relatively small. You might have a year's worth of snapshots, and if I understand things correctly, if you lose a chunk from that first snapshot that's referenced in all subsequent snapshots, you've just lost everything.

Have I understood that correctly? If so, does anyone know of any cloud storage providers advertising 100% durability? :)

geek-merlin · 2018-08-17T16:16:46Z

So without this we have a serious risk of big loss. THen it is critical for any serious use.

mr-flibble · 2018-10-21T18:55:09Z

I was just testing the failure mode of the storage losing a chunk (by manually deleting one) and was surprised to see no files were recoverable after the point that Duplicacy ran into a missing chunk.

@jtackaberry That's strange. I was testing this couple days ago and I have different result.
I added new big file to backup, do backup and delete some chunks. After that do another backup - I get no warnings.

After I did restore of that new big file (to whom I deleted chunks) - and restore was performed without error! But restored file was smaller ...smaller and corrupted.

Conclusion for my test in GUI version 2.2.1

There is no warning for missing chunks when backuping - so user don't know that future backup will be useless
There is no checking if restored file is corrupted/different size

gilbertchen · 2018-10-24T01:47:31Z

There are two types of chunks, file chunks and metadata chunks. If the missing chunk is a file chunk, then it only prevents the restoration of affected file(s). If the missing chunk is a metadata chunk, then the entire affected snapshot can't be restored.

Duplicacy assumes that the storage is reliable, so it doesn't implement any error correction or backup repairing (which will only add more complexity and may potentially lead to more bugs). Instead, it is recommended that you use the copy command to make multiple copies on different storages (a unique feature of Duplicacy).

There is no warning for missing chunks when backuping - so user don't know that future backup will be useless

The current GUI was designed be a simple wrapper that does backup and prune, so you'll need to run the check command using the CLI version if the storage isn't reliable.

There is no checking if restored file is corrupted/different size

This is a bug in the current GUI where the error from the CLI isn't parsed correctly. The upcoming new web-based GUI should not have this issue.

mr-flibble · 2018-10-27T10:15:53Z

@gilbertchen Thanks for reply. I tested same scenario with CLI restore and I get error message about chunk can't be fund - so that's good.
Copy command sounds like very good option for Duplicacy in current reliability situation. I hope that new GUI will support copy command.

EDIT: I find out that GUI restore will restore file even with missing chunks - that's great :)

geek-merlin · 2018-12-30T20:51:58Z

Thought and researched a bit about this. Having a second copy means 100% more data, and when there's an inconsistency, you still do not know which is the correct one. Maths can do better via Reed-Solomon codes: You can adjust how many % more storage you chip in and get some % of bis which can be autocorrected. No wonder cloud storage providers use it.

FYI, there is a go package port of the backblaze-reedsolomon implementation: https://github.com/klauspost/reedsolomon

I'd really really love to have the storage layer robustified with this.

gilbertchen added the enhancement label Sep 5, 2017

twlee79 mentioned this issue May 6, 2020

Adds -persist option to check and restore commands to continue despite errors #595

Merged

gilbertchen closed this as completed in #595 Sep 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhancement - Backup recovery with missing chunks #157

Enhancement - Backup recovery with missing chunks #157

naveed-patel commented Sep 4, 2017

jtackaberry commented Jul 10, 2018

geek-merlin commented Aug 17, 2018

mr-flibble commented Oct 21, 2018

gilbertchen commented Oct 24, 2018

mr-flibble commented Oct 27, 2018 •

edited

Loading

geek-merlin commented Dec 30, 2018

Enhancement - Backup recovery with missing chunks #157

Enhancement - Backup recovery with missing chunks #157

Comments

naveed-patel commented Sep 4, 2017

jtackaberry commented Jul 10, 2018

geek-merlin commented Aug 17, 2018

mr-flibble commented Oct 21, 2018

gilbertchen commented Oct 24, 2018

mr-flibble commented Oct 27, 2018 • edited Loading

geek-merlin commented Dec 30, 2018

mr-flibble commented Oct 27, 2018 •

edited

Loading