Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FSlint with database support #152

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

dipietro-salvatore
Copy link

Fslint with SQLite support to avoid to hash twice the same file during the findup process.

@MarcinOrlowski
Copy link

Do you have any metrics how much this would speed things up?

@dipietro-salvatore
Copy link
Author

Well, In my case it made the difference. I wanted to find all the duplicated file on an entire HDD. This allowed me to:

  • Suspend and restart the scanning as many time I wanted (and not leave my pc on for days)
  • Check in the future without rescanning everything if new finds are copies
  • Compare the data with also other HDD

I found it very useful and save me a lot of time. It does not impact in any way the program performance but it reduces the time necessary to re-scan the same folder/disk in the future.

@perfect7gentleman
Copy link

Traceback (most recent call last):
  File "/usr/lib/python-exec/python2.7/database", line 36, in <module>
    os.makedirs(directory)
  File "/usr/lib64/python2.7/os.py", line 157, in makedirs
    mkdir(name, mode)
OSError: [Errno 13] Permission denied: '/usr/share/fslint/fslint/../databases'

It should be used under root only?

fix permission issue databases folder
@dipietro-salvatore
Copy link
Author

Hi @perfect7gentleman,
I just updated the git to fix the problem. Please, re-clone the git repo.
Here some instructions to run in the terminal:

cd /usr/share/
sudo mv fslint fslint-orig
sudo git clone https://github.com/dipietro-salvatore/fslint.git fslint

Let me know if this fix the issue.
Thanks

@perfect7gentleman
Copy link

perfect7gentleman commented Nov 13, 2019

Permission fixed,
But

Traceback (most recent call last):
  File "/usr/lib/python-exec/python2.7/database", line 48, in <module>
    cursor.execute('''SELECT * FROM files WHERE path IN ({seq})'''.format(seq=','.join(['?']*len(files_list))), ([f for f in files_list]))
sqlite3.OperationalError: too many SQL variables

also it cannot find any dupes at all, in other words it doesn't work

@dipietro-salvatore
Copy link
Author

can you please provide more information about the system and error?
I am not able to reproduce the error.

@perfect7gentleman
Copy link

System - Gentoo.
Error is gone. But it doesn't find dupes anyway.

@perfect7gentleman
Copy link

Nope. It's back.

Traceback (most recent call last):
  File "/usr/lib/python-exec/python2.7/database", line 48, in <module>
    cursor.execute('''SELECT * FROM files WHERE path IN ({seq})'''.format(seq=','.join(['?']*len(files_list))), ([f for f in files_list]))
sqlite3.OperationalError: too many SQL variables

@perfect7gentleman
Copy link

What info is needed?

@dipietro-salvatore
Copy link
Author

I did some changes to the repo.
Can you please to clone it and run it again?

@perfect7gentleman
Copy link

The same error

Traceback (most recent call last):
  File "/home/MZ7WD240HAFV/Temporary/fslint/fslint/supprt/database", line 49, in <module>
    cursor.execute('''SELECT * FROM files WHERE path IN ({seq})'''.format(seq=','.join(['?']*len(files_list))), ([f for f in files_list]))
sqlite3.OperationalError: too many SQL variables

@dipietro-salvatore
Copy link
Author

To be able to replicate the error, Can you please tell me how many files to you have on the scanning folder(s)?

@perfect7gentleman
Copy link

129.5 MiB (135,783,482)
8,943 files, 1,870 sub-folders

@dipietro-salvatore
Copy link
Author

I changed the way how the script retrieve the data from the DB. Can you please try now?

@perfect7gentleman
Copy link

Now it works.

54.1MB wasted in 2547 files (in 993 groups)

@dipietro-salvatore
Copy link
Author

Happy to ear that! Out of curiosity, do you know how much time it takes the first time compared to the second time using the DB (if you haven't deleted the files immediately)?

@perfect7gentleman
Copy link

rather fast

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants