-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backup directories being duplicated #1396
Comments
Dear Capybara-overdose, I assume you know but I have to ask. Can you please check and very if that files are really full copies or just hardlinks. Please see the FAQ about it. You can use Something like this should work: I can't guarantee but I would suspect that the problem would also be there in the current latest release (1.3.3). I assume this has something to do with the "new permission handling" (#988). This is one of the bug we still working on with highest priority. |
Thanks for the quick reply, and in particular the clear instructions! I may have complicated things slightly - I had another fiddle with deleting the 'Personal Meda' folder from certain snapsshots to see how it affected things. Unsurprisingly, it seems to just move the duplicate folder to another snapshot. I've attached another screenhot to show the new location of the duplicates. I ran the
The output:
My reading of this is that the first three (source file location and the two snapshots showing as occupying disk space) are all indeed unique files, whereas the randow snapshot is a hardlink, back to the older snapshot (20221009-230001). Or, in short - the snapshots in question are indeed redundant duplicates. Am I reading this right? |
I see three snapshots in your output. It seems that the 3rd is a duplicate (hardlink) of the 1st. But the 2nd has its own unique inode number. Do not believe your GUI showing you how much space a file does occupy. Applications like this often do not take hardlinks into account. You can have 10 times 500 GB snapshots on a 500 GB hard drive just because of hardlinks. GUIs like this would show you 5 000 GB used on a 500 GB hard drive. |
Yes, like I said, the first two snapshots are the ones showing large amount of space occupied by the duplicate files. The third was a much smaller one, to test that was I was seeing in Disk Usage Analyser was correct - the sizes are indeed accurate, and the 3rd snapshot was just full of mostly hardlinks. I read through the other issue report you linked, and tried adding the "--no-perms --no-group --no-owner" parameters, and running a new backup. After some time doing a 'smart remove' it seems to have re-allocated all the files....but AGAIN, there are two directories/snapshots that seem to contain largely duplicates - 02221009-230001-269 (as before) and now the latest snapshot made; 2023083145-932 (see attached). ls -li on the same photo in Personal media for each of these snapshots gives a unique inode:
So this confirms they are indeed still unique duplicate files, not hardlinks - ie the GUI is correct in displaying them as occupying disk space. I then ran the same command on the snapshot previous to each of these: 20220925-230001-791 and 20230107-220001-713. (see attached). In Disk Usage analyser, the entire Personal Media directory is shown as only 856.1kB (as opposed to 417GB) but both still appear to contain the test file IMG_8832.jpg. I ran ls -il on this file in both snapshots a, which gives:
These appear to the same inodes as the first test - ie they are indeed hardlink to their respective subsequent snapshots, each containing full copies of all actual files, and the GUI is showing disk usage correctly in terms of hard links vs actual files. Question still remains as to WHY there are TWO snapshots with full duplicates. I decided to run ls -li on test files in different snapshots (because I've stuffed around deleting Personal Media from snapshots trying to see if it would remove the duplicate). I did this on a number of files I know havent been touched in years, and what i noticed for all of them is that the hardlinks from snapshots for 14 Dec onwards, all refer to the most recent 'full duplicate' snapshot ( 2023083145-932). Conversely, hard links from snaphots prior refer to the older 'full duplicate' snapshot ( 02221009-230001-269). This is very interesting, as the significance of this date is that it's when my 'smart remove' schedule starts keeping DAILY snapshots (for the last 30 days. I've noticed this on the previous instances this pops up - one of the duplicates is in the Daily range, the other back in the weekly range. Almost like BiT is keeping two separate full snapshots for each date range of snapshots |
It really looks like you have two real copies of the file in your backups. AFAIR the 3rd columns is the reference counter (how many links to the inode exist) and the source file (first The other files (in the snapshots) have higher reference counts (13 and 28) and this indicates that the the hard links in the snapshots do work. The backup file of the most-recent snapshot "20230102-230001-408" has indeed another inode number (first column) and is another file IMHO. This may happen if the file or extended attributes have been changed. Can you please compare the two duplicated files byte-by-byte with My suspicion is that any photo software was scanning your media files and adding or updating some meta data without changing the file size and change date... Edit: Could you please also show the output of |
So is there any intention to fix this at all? It's still doing it, on multiple machines, and basically makes the whole tool useless because it eats up all the drive space in just a few snapshots. |
Dear @capybara-overdose , I'm not sure how other team mates feel about your case but for me it is still not clear what the root of the problem is and if BIT really is the problem here or some external factor. From my point of view and my knowledge about how BIT and rsync works your situation shouldn't happen. In short: It is not clear if this is a Bug in BIT and if it needs a fix. But we will work on that Issue and try to find a reason for the problem and maybe a solution. Keep in mind that we are not able to reproduce that problem currently. |
This machine ONLY runs BiT (its the home backup machine) - there is no "photo editing" or "external factors" in play. I don't know why people keep trying to pull this deflection months after the fact, or how anyone can seriously be saying 'it's still not clear' considering the time I've already spent demonstrating the files are 100% without a doubt duplicates. Doesn't matter at all whether you 'think it should happen' - it IS happening I'm frankly not inclined to spend any more of my time on pointless busywork when it feels like an exercise in gaslighting myself into thinking somehow there is no problem, when even the people asking for yet more arbitrary "tests" themselves admit "It really looks like you have two real copies of the file in your backups". As for having it on a 'milestone': From what that page says, your plan is for it to just stay broken until "to the release after the next release." ? Seriously?? If you just can't be bothered fixing this, call the project abandoned and be done with it so people aren't being misled into wasting their time. |
Right -just so it can't be used as an excuse, here it is ALL OVER AGAIN with the new commands. This time I'm focusing on the Music folder, because all the previous ones have been deleted by now. As you can see, the 20230406 backup totally duplicated the 20210122 backup directory - same size and number of files occupying disk space. No, I do not have any media management software on this that edits metadata or anything like that, so don't point fingers there. Output of ls -li on a sample file in (in order) the source folder, the 20230406 backup, an adjacent previous backup, the 20210122 backup, and another adjacent previous backup. Clearly can see unique inodes on the working copy, the 20230406 copy, and then a third on all copies before then, including the actual stored file in the 20210122 backup directory. Three independant files occupying space on disk: the original, the 20230406 backup, and the 20230122 backup (which is what the other backups link to) Output of various cmp's between the files - all the same Output of stat for all those files. So it looks like BiT is has, for whatever reason, changed the permission (incorrectly) on the 20230406 backup to 755, when the original, and previous backups, were all 777. Sounds familiar? Still "not clear" if this is a bug in BiT? |
The project was abandoned for multiple years before it was picked up by a small group of volunteers a number of months ago. Much has been accomplished, and there is still much to do for the new team, who all have day-jobs and/or families to deal with. So thank you for your patience, maintaining a friendly tone, and your appreciation of the volunteer work being done here. |
Another month without effective backup system. Still doing it, repeatedly, no matter what I do to try trim it down. Is there going to be any attempt to address this or can we just call the app dead at this point? |
Dear @capybara-overdose , |
If you went to a doctor (3 times over the course of six months) and the best they could do was snarkily ask "Well, how would you fix this pain in your chest?" - would you them seriously? Would you take responsibility yourself for their inability to perform a job they chose to take on? And go try and make assessments you never pretended to be qualified for? Or would you find a new doctor? For the same reason I pay for my private health insurance, instead of relying on "free" community options or worse - self diagnosis, you've made it clear this "project" is a failure and the only real option is to pay for a properly managed solution. This app, like most software in the Linux space, obviously has bigger problems than just the code: the people. Do whatever you want. Or rather, don't. Back to windows for me. |
I have BiT using the 'Smart Remove' function (settings attached), maintaining rolling backups of around 700GB of data (over various directories) to a 2TB drive - and it keeps filling the whole drive up.
I went investigating today and found that two snapshot folders seem to contain large amounts of duplicate folders, each with actual full copies of identical unmodified files (picture attached, note the 'Personal Media' folder )
The offending snapshots themselves are from about 3 months ago, and about two weeks ago, neither manually created. There have of course been some changes to some directories/files over that time. But there are also some folders (eg Personal Media) that haven't changed at all, so I don't know why BiT has re-copied actual duplicate files in both snapshot directories.
If I try and delete these snapshots, the files simply move to an adjacent snapshot folder. I was under the impression that it's only supposed to store one actual copy of a given file in the backup directory, and hardlink to it for other future snapshots, until the file is changed. But these duplicates are definitely unchanged files, and yet the've been fully copied instead of hardlinked to the original (which are the same exact files)
Is this intended behaviour? Has anyone else seen it? I'm on 1.2.1, because that's what's in the repos.
Thanks
The text was updated successfully, but these errors were encountered: