Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backup directories being duplicated #1396

Closed
capybara-overdose opened this issue Jan 10, 2023 · 14 comments
Closed

Backup directories being duplicated #1396

capybara-overdose opened this issue Jan 10, 2023 · 14 comments
Labels

Comments

@capybara-overdose
Copy link

capybara-overdose commented Jan 10, 2023

I have BiT using the 'Smart Remove' function (settings attached), maintaining rolling backups of around 700GB of data (over various directories) to a 2TB drive - and it keeps filling the whole drive up.

I went investigating today and found that two snapshot folders seem to contain large amounts of duplicate folders, each with actual full copies of identical unmodified files (picture attached, note the 'Personal Media' folder )

The offending snapshots themselves are from about 3 months ago, and about two weeks ago, neither manually created. There have of course been some changes to some directories/files over that time. But there are also some folders (eg Personal Media) that haven't changed at all, so I don't know why BiT has re-copied actual duplicate files in both snapshot directories.

If I try and delete these snapshots, the files simply move to an adjacent snapshot folder. I was under the impression that it's only supposed to store one actual copy of a given file in the backup directory, and hardlink to it for other future snapshots, until the file is changed. But these duplicates are definitely unchanged files, and yet the've been fully copied instead of hardlinked to the original (which are the same exact files)

Is this intended behaviour? Has anyone else seen it? I'm on 1.2.1, because that's what's in the repos.

Thanks

Screenshot from 2023-01-11 03-04-04
Screenshot from 2023-01-11 06-01-27

@capybara-overdose capybara-overdose changed the title Backup directories contain duplicates Backup directories being duplicated Jan 11, 2023
@buhtz
Copy link
Member

buhtz commented Jan 11, 2023

Dear Capybara-overdose,
thank you for that report.

I assume you know but I have to ask. Can you please check and very if that files are really full copies or just hardlinks. Please see the FAQ about it. You can use ls -li to show the inode number of a specific file (pick one from your PersonalMedia folder). Compare that number with the "same" file from another snapshot.

Something like this should work: ls -li backintime/pop-os/blurred/1/*/backup/media/blurred/blurred/PersonalMedia/specific.file (be aware of the wildcard * in that path)

I can't guarantee but I would suspect that the problem would also be there in the current latest release (1.3.3).

I assume this has something to do with the "new permission handling" (#988). This is one of the bug we still working on with highest priority.

@buhtz buhtz added Bug Feedback needs user response, may be closed after timeout without a response labels Jan 11, 2023
@capybara-overdose
Copy link
Author

Thanks for the quick reply, and in particular the clear instructions!

I may have complicated things slightly - I had another fiddle with deleting the 'Personal Meda' folder from certain snapsshots to see how it affected things. Unsurprisingly, it seems to just move the duplicate folder to another snapshot. I've attached another screenhot to show the new location of the duplicates.

I ran thels -li command as instructed on a file in Personal Media on (in order):

  • the actual source file itself on the local disk
  • the oldest BiT backup that was showing as occupying actual disk space.
  • the 'redundant' copy in another snapshot
  • the apparent copy in a random snapshot not showing as occupying (excessive) disk space.

The output:

[blurred]@pop-os:~$ ls -li '/media/[blurred]/[blurred]/Personal Media/IMG_8832.jpg' 
7340284 -rwxrwxrwx 1 [blurred] [blurred] 7844502 Jun  7  2016 '/media/[blurred]/[blurred]/Personal Media/IMG_8832.jpg'
[blurred]@pop-os:~$ ls -li '/media/[blurred]/[blurred]/backintime/pop-os/[blurred]/1/20221009-230001-269/backup/media/[blurred]/[blurred]/Personal Media/IMG_8832.jpg' 
11802406 -rwxrwxrwx 13 [blurred] [blurred] 7844502 Jun  7  2016 '/media/[blurred]/[blurred]/backintime/pop-os/[blurred]/1/20221009-230001-269/backup/media/[blurred]/[blurred]/Personal Media/IMG_8832.jpg'
[blurred]@pop-os:~$ ls -li '/media/[blurred]/[blurred]/backintime/pop-os/[blurred]/1/20230102-230001-408/backup/media/[blurred]/[blurred]/Personal Media/IMG_8832.jpg' 
73031673 -rwxrwxrwx 28 [blurred] [blurred] 7844502 Jun  7  2016 '/media/[blurred]/[blurred]/backintime/pop-os/[blurred]/1/20230102-230001-408/backup/media/[blurred]/[blurred]/Personal Media/IMG_8832.jpg'
[blurred]@pop-os:~$ ls -li '/media/[blurred]/[blurred]/backintime/pop-os/[blurred]/1/20221106-230001-738/backup/media/[blurred]/[blurred]/Personal Media/IMG_8832.jpg' 
11802406 -rwxrwxrwx 13 [blurred] [blurred] 7844502 Jun  7  2016 '/media/[blurred]/[blurred]/backintime/pop-os/[blurred]/1/20221106-230001-738/backup/media/[blurred]/[blurred]/Personal Media/IMG_8832.jpg'

My reading of this is that the first three (source file location and the two snapshots showing as occupying disk space) are all indeed unique files, whereas the randow snapshot is a hardlink, back to the older snapshot (20221009-230001). Or, in short - the snapshots in question are indeed redundant duplicates.

Am I reading this right?

Thanks again
Screenshot from 2023-01-12 07-43-34
Screenshot from 2023-01-12 07-32-44

@buhtz
Copy link
Member

buhtz commented Jan 12, 2023

I see three snapshots in your output.

It seems that the 3rd is a duplicate (hardlink) of the 1st. But the 2nd has its own unique inode number.
I'm confused about that ordering. I'm also not sure how the last three digits in the folder-names (e.g. 20230102-230001-408) are constructed.

Do not believe your GUI showing you how much space a file does occupy. Applications like this often do not take hardlinks into account. You can have 10 times 500 GB snapshots on a 500 GB hard drive just because of hardlinks. GUIs like this would show you 5 000 GB used on a 500 GB hard drive.

@capybara-overdose
Copy link
Author

capybara-overdose commented Jan 12, 2023

Yes, like I said, the first two snapshots are the ones showing large amount of space occupied by the duplicate files. The third was a much smaller one, to test that was I was seeing in Disk Usage Analyser was correct - the sizes are indeed accurate, and the 3rd snapshot was just full of mostly hardlinks.

I read through the other issue report you linked, and tried adding the "--no-perms --no-group --no-owner" parameters, and running a new backup. After some time doing a 'smart remove' it seems to have re-allocated all the files....but AGAIN, there are two directories/snapshots that seem to contain largely duplicates - 02221009-230001-269 (as before) and now the latest snapshot made; 2023083145-932 (see attached).

ls -li on the same photo in Personal media for each of these snapshots gives a unique inode:

[blurred]@pop-os:~$ ls -li '/media/[blurred]/[blurred]/backintime/pop-os/[blurred]/1/20221009-230001-269/backup/media/[blurred]/[blurred]/Personal Media/IMG_8832.jpg' 
**11802406** -rwxrwxrwx 13 [blurred] [blurred] 7844502 Jun  7  2016 '/media/[blurred]/[blurred]/backintime/pop-os/[blurred]/1/20221009-230001-269/backup/media/[blurred]/[blurred]/Personal Media/IMG_8832.jpg'
[blurred]@pop-os:~$ ls -li '/media/[blurred]/[blurred]/backintime/pop-os/[blurred]/1/20230112-083145-932/backup/media/[blurred]/[blurred]/Personal Media/IMG_8832.jpg' 
73031673 -rwxrwxrwx 8 [blurred] [blurred] 7844502 Jun  7  2016 '/media/[blurred]/[blurred]/backintime/pop-os/[blurred]/1/20230112-083145-932/backup/media/[blurred]/[blurred]/Personal Media/IMG_8832.jpg'

So this confirms they are indeed still unique duplicate files, not hardlinks - ie the GUI is correct in displaying them as occupying disk space.

I then ran the same command on the snapshot previous to each of these: 20220925-230001-791 and 20230107-220001-713. (see attached). In Disk Usage analyser, the entire Personal Media directory is shown as only 856.1kB (as opposed to 417GB) but both still appear to contain the test file IMG_8832.jpg. I ran ls -il on this file in both snapshots a, which gives:

[blurred]@pop-os:~$ ls -li '/media/[blurred]/[blurred]/backintime/pop-os/[blurred]/1/20220929-210001-343/backup/media/[blurred]/[blurred]/Personal Media/IMG_8832.jpg' 
11802406 -rwxrwxrwx 13 [blurred] wal[blurred]flower 7844502 Jun  7  2016 '/media/[blurred]/[blurred]/backintime/pop-os/[blurred]/1/20220929-210001-343/backup/media/[blurred]/[blurred]/Personal Media/IMG_8832.jpg'
[blurred]@pop-os:~$ ls -li '/media/[blurred]/[blurred]/backintime/pop-os/[blurred]/1/20230107-220001-713/backup/media/[blurred]/[blurred]/Personal Media/IMG_8832.jpg' 
73031673 -rwxrwxrwx 8 [blurred] [blurred] 7844502 Jun  7  2016 '/media/[blurred]/[blurred]/backintime/pop-os/[blurred]/1/20230107-220001-713/backup/media/[blurred]/[blurred]/Personal Media/IMG_8832.jpg'

These appear to the same inodes as the first test - ie they are indeed hardlink to their respective subsequent snapshots, each containing full copies of all actual files, and the GUI is showing disk usage correctly in terms of hard links vs actual files.

Question still remains as to WHY there are TWO snapshots with full duplicates.

I decided to run ls -li on test files in different snapshots (because I've stuffed around deleting Personal Media from snapshots trying to see if it would remove the duplicate).

I did this on a number of files I know havent been touched in years, and what i noticed for all of them is that the hardlinks from snapshots for 14 Dec onwards, all refer to the most recent 'full duplicate' snapshot ( 2023083145-932). Conversely, hard links from snaphots prior refer to the older 'full duplicate' snapshot ( 02221009-230001-269).

This is very interesting, as the significance of this date is that it's when my 'smart remove' schedule starts keeping DAILY snapshots (for the last 30 days. I've noticed this on the previous instances this pops up - one of the duplicates is in the Daily range, the other back in the weekly range. Almost like BiT is keeping two separate full snapshots for each date range of snapshots

Screenshot from 2023-01-12 20-09-12

Screenshot from 2023-01-12 20-38-50

@aryoda
Copy link
Contributor

aryoda commented Jan 21, 2023

The output:

[blurred]@pop-os:~$ ls -li '/media/[blurred]/[blurred]/Personal Media/IMG_8832.jpg' 
7340284 -rwxrwxrwx 1 [blurred] [blurred] 7844502 Jun  7  2016 '/media/[blurred]/[blurred]/Personal Media/IMG_8832.jpg'
[blurred]@pop-os:~$ ls -li '/media/[blurred]/[blurred]/backintime/pop-os/[blurred]/1/20221009-230001-269/backup/media/[blurred]/[blurred]/Personal Media/IMG_8832.jpg' 
11802406 -rwxrwxrwx 13 [blurred] [blurred] 7844502 Jun  7  2016 '/media/[blurred]/[blurred]/backintime/pop-os/[blurred]/1/20221009-230001-269/backup/media/[blurred]/[blurred]/Personal Media/IMG_8832.jpg'
[blurred]@pop-os:~$ ls -li '/media/[blurred]/[blurred]/backintime/pop-os/[blurred]/1/20230102-230001-408/backup/media/[blurred]/[blurred]/Personal Media/IMG_8832.jpg' 
73031673 -rwxrwxrwx 28 [blurred] [blurred] 7844502 Jun  7  2016 '/media/[blurred]/[blurred]/backintime/pop-os/[blurred]/1/20230102-230001-408/backup/media/[blurred]/[blurred]/Personal Media/IMG_8832.jpg'
[blurred]@pop-os:~$ ls -li '/media/[blurred]/[blurred]/backintime/pop-os/[blurred]/1/20221106-230001-738/backup/media/[blurred]/[blurred]/Personal Media/IMG_8832.jpg' 
11802406 -rwxrwxrwx 13 [blurred] [blurred] 7844502 Jun  7  2016 '/media/[blurred]/[blurred]/backintime/pop-os/[blurred]/1/20221106-230001-738/backup/media/[blurred]/[blurred]/Personal Media/IMG_8832.jpg'

My reading of this is that the first three (source file location and the two snapshots showing as occupying disk space) are all indeed unique files, whereas the randow snapshot is a hardlink, back to the older snapshot (20221009-230001). Or, in short - the snapshots in question are indeed redundant duplicates.

It really looks like you have two real copies of the file in your backups.

AFAIR the 3rd columns is the reference counter (how many links to the inode exist) and the source file (first ls -li) has "1" which is normal.

The other files (in the snapshots) have higher reference counts (13 and 28) and this indicates that the the hard links in the snapshots do work.

The backup file of the most-recent snapshot "20230102-230001-408" has indeed another inode number (first column) and is another file IMHO.

This may happen if the file or extended attributes have been changed.

Can you please compare the two duplicated files byte-by-byte with cmp -b <file1> <file2> to clarify if the files are really the same (and should not by stored twice by rsync)?

My suspicion is that any photo software was scanning your media files and adding or updating some meta data without changing the file size and change date...

Edit: Could you please also show the output of stat <file> for each of the duplicated files. The owner (UID/GID) may have changed but is not visible your screen shots.

@buhtz buhtz added this to the 1.3.5 or 1.4.0 milestone Mar 7, 2023
@capybara-overdose
Copy link
Author

capybara-overdose commented Apr 7, 2023

So is there any intention to fix this at all? It's still doing it, on multiple machines, and basically makes the whole tool useless because it eats up all the drive space in just a few snapshots.

@buhtz
Copy link
Member

buhtz commented Apr 7, 2023

Dear @capybara-overdose ,
please see the "milestone". It means there is an intention to fix it.

I'm not sure how other team mates feel about your case but for me it is still not clear what the root of the problem is and if BIT really is the problem here or some external factor. From my point of view and my knowledge about how BIT and rsync works your situation shouldn't happen.

In short: It is not clear if this is a Bug in BIT and if it needs a fix. But we will work on that Issue and try to find a reason for the problem and maybe a solution.

Keep in mind that we are not able to reproduce that problem currently.
You have to help us to help you. So please read carefully the postings from aryoda, answer his questions and provide the detailed information's he asked for.

@capybara-overdose
Copy link
Author

capybara-overdose commented Apr 11, 2023

This machine ONLY runs BiT (its the home backup machine) - there is no "photo editing" or "external factors" in play. I don't know why people keep trying to pull this deflection months after the fact, or how anyone can seriously be saying 'it's still not clear' considering the time I've already spent demonstrating the files are 100% without a doubt duplicates. Doesn't matter at all whether you 'think it should happen' - it IS happening

I'm frankly not inclined to spend any more of my time on pointless busywork when it feels like an exercise in gaslighting myself into thinking somehow there is no problem, when even the people asking for yet more arbitrary "tests" themselves admit "It really looks like you have two real copies of the file in your backups".

As for having it on a 'milestone': From what that page says, your plan is for it to just stay broken until "to the release after the next release." ? Seriously?? If you just can't be bothered fixing this, call the project abandoned and be done with it so people aren't being misled into wasting their time.

@capybara-overdose
Copy link
Author

capybara-overdose commented Apr 11, 2023

Right -just so it can't be used as an excuse, here it is ALL OVER AGAIN with the new commands.

This time I'm focusing on the Music folder, because all the previous ones have been deleted by now. As you can see, the 20230406 backup totally duplicated the 20210122 backup directory - same size and number of files occupying disk space. No, I do not have any media management software on this that edits metadata or anything like that, so don't point fingers there.

Screenshot from 2023-04-11 14-24-20

Output of ls -li on a sample file in (in order) the source folder, the 20230406 backup, an adjacent previous backup, the 20210122 backup, and another adjacent previous backup. Clearly can see unique inodes on the working copy, the 20230406 copy, and then a third on all copies before then, including the actual stored file in the 20210122 backup directory. Three independant files occupying space on disk: the original, the 20230406 backup, and the 20230122 backup (which is what the other backups link to)

Screenshot from 2023-04-11 14-41-55

Output of various cmp's between the files - all the same

Screenshot from 2023-04-11 14-46-08

Output of stat for all those files.

Screenshot from 2023-04-11 14-43-15

So it looks like BiT is has, for whatever reason, changed the permission (incorrectly) on the 20230406 backup to 755, when the original, and previous backups, were all 777. Sounds familiar?

#994 #988

Still "not clear" if this is a bug in BiT?

@buhtz buhtz removed the Feedback needs user response, may be closed after timeout without a response label Apr 11, 2023
@emtiu
Copy link
Member

emtiu commented Apr 11, 2023

As for having it on a 'milestone' ...from what that page says, your plan is for it to just stay broken until "to the release after the next release." ? Seriously???? If you just can't be bothered fixing this, call the project abandoned and be done with it so people aren't being misled into wasting their time.

The project was abandoned for multiple years before it was picked up by a small group of volunteers a number of months ago. Much has been accomplished, and there is still much to do for the new team, who all have day-jobs and/or families to deal with.

So thank you for your patience, maintaining a friendly tone, and your appreciation of the volunteer work being done here.

@capybara-overdose
Copy link
Author

Another month without effective backup system. Still doing it, repeatedly, no matter what I do to try trim it down.

Is there going to be any attempt to address this or can we just call the app dead at this point?

@buhtz
Copy link
Member

buhtz commented May 12, 2023

Dear @capybara-overdose ,
seriously asked. If you where the maintainer of that project. Which steps would you do to analyze the problem and find a solution?

@capybara-overdose
Copy link
Author

capybara-overdose commented May 15, 2023

@buhtz

If you went to a doctor (3 times over the course of six months) and the best they could do was snarkily ask "Well, how would you fix this pain in your chest?" - would you them seriously? Would you take responsibility yourself for their inability to perform a job they chose to take on? And go try and make assessments you never pretended to be qualified for?

Or would you find a new doctor?

For the same reason I pay for my private health insurance, instead of relying on "free" community options or worse - self diagnosis, you've made it clear this "project" is a failure and the only real option is to pay for a properly managed solution. This app, like most software in the Linux space, obviously has bigger problems than just the code: the people.

Do whatever you want. Or rather, don't. Back to windows for me.

@buhtz
Copy link
Member

buhtz commented May 26, 2023

Issue might be a duplicate of #994 and/or #988 .
It is linked to them. The information provided here are linked there.

@buhtz buhtz closed this as completed May 26, 2023
@bit-team bit-team locked and limited conversation to collaborators May 26, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants