-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix logic bug in external file clean up #956
Fix logic bug in external file clean up #956
Conversation
49206ef
to
e0bd234
Compare
datajoint/external.py
Outdated
@@ -338,7 +338,7 @@ def delete(self, *, delete_external_files=None, limit=None, display_progress=Tru | |||
error_list = [] | |||
for uuid, external_path in items: | |||
try: | |||
count = (self & {'hash': uuid}).delete_quick(get_count=True) # optimize | |||
count = len(self & {'hash': uuid}) # optimize | |||
except Exception: | |||
pass # if delete failed, do not remove the external file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is stale. No delete has been attempted.
datajoint/external.py
Outdated
@@ -338,7 +338,7 @@ def delete(self, *, delete_external_files=None, limit=None, display_progress=Tru | |||
error_list = [] | |||
for uuid, external_path in items: | |||
try: | |||
count = (self & {'hash': uuid}).delete_quick(get_count=True) # optimize | |||
count = len(self & {'hash': uuid}) # optimize | |||
except Exception: | |||
pass # if delete failed, do not remove the external file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exception is expected from the len
query? Why is this helpful?
datajoint/external.py
Outdated
@@ -338,7 +338,7 @@ def delete(self, *, delete_external_files=None, limit=None, display_progress=Tru | |||
error_list = [] | |||
for uuid, external_path in items: | |||
try: | |||
count = (self & {'hash': uuid}).delete_quick(get_count=True) # optimize | |||
count = len(self & {'hash': uuid}) # optimize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to get the count? Why do we need an extra query rather than deleting directly?
datajoint/external.py
Outdated
@@ -347,6 +347,8 @@ def delete(self, *, delete_external_files=None, limit=None, display_progress=Tru | |||
self._remove_external_file(external_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it better to remove the external file before deleting from the external table. What if the delete from the external table fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zitrosolrac Thanks for the PR! 🤝
Need a few more refinements to the delete algorithm, test, and a bit more clean up.
Also, make sure to remove the binary files that were accidentally committed.
tests/__init__.py
Outdated
@@ -20,7 +20,7 @@ | |||
__author__ = 'Edgar Walker, Fabian Sinz, Dimitri Yatsenko, Raphael Guzman' | |||
|
|||
# turn on verbose logging | |||
logging.basicConfig(level=logging.DEBUG) | |||
#logging.basicConfig(level=logging.DEBUG) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uncomment this. Let's not commit debug changes.
#logging.basicConfig(level=logging.DEBUG) | |
logging.basicConfig(level=logging.DEBUG) |
tests/test_external.py
Outdated
from .schema_external import SimpleRemote | ||
from .schema_external import stores_config, SimpleRemote, Simple, schema | ||
|
||
import json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this is no longer necessary.
import json |
tests/test_external.py
Outdated
|
||
import json | ||
import os | ||
from os import stat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No longer necessary
from os import stat |
tests/test_external.py
Outdated
import pwd | ||
from pwd import getpwuid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import pwd | |
from pwd import getpwuid |
tests/test_external.py
Outdated
@@ -21,32 +26,14 @@ def tearDown(self): | |||
dj.config['stores']['local']['location'] = current_location_local | |||
|
|||
|
|||
def test_external_put(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this deleted?
tests/test_external.py
Outdated
def test_remove_fail(): | ||
#https://github.com/datajoint/datajoint-python/issues/953 | ||
|
||
#print(json.dumps(dj.config['stores'], indent=4)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These unnecessary comments should be cleaned up if they don't contain notes.
tests/test_external.py
Outdated
|
||
#print('location') | ||
# print('\n IN TEST: BEFORE DELETE: list of dir stores, local, location') | ||
print('stores location -----------\n') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These prints do not look like they are necessary for the test.
tests/test_external.py
Outdated
path1 = dj.config['stores']['local']['location'] + '/djtest_extern/4/c/' | ||
|
||
argDir = dj.config['stores']['local']['location'] + '/djtest_extern/4/c/' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be simplified to one variable.
datajoint/external.py
Outdated
@@ -338,7 +338,7 @@ def delete(self, *, delete_external_files=None, limit=None, display_progress=Tru | |||
error_list = [] | |||
for uuid, external_path in items: | |||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like lines 340-345 are unnecessary here.
@zitrosolrac oh forgot one thing. Can you please update the changelog and |
…delete method, and re write test_removal_fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zitrosolrac Nice work man! Just some minor things left. 👏
Co-authored-by: Raphael Guzman <[email protected]>
…c/datajoint-python into external_removal_fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zitrosolrac 🎉 WooHoo! 🎉 Nice job on a tricky one 👍
datajoint/external.py
Outdated
self._remove_external_file(external_path) | ||
except Exception as error: | ||
# adding row back into table after failed delete | ||
self.insert1(row[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set skip_duplicates=True
to address potential race conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's okay not to re-insert if the delete fails. The procedure will return the undeleted files to the user for special handling. Without implementing an actual transaction management process, we can allow some orphaned files in the external storage and provide an additional cleanup utility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dimitri-yatsenko That is a good suggestion for the skipping duplicates. Regarding the cleanup, I feel this is actually necessary. There is basically 2 primary concerns:
- The user has only one chance to catch the errors. Meaning that running the external delete/clean up multiple times will yield no change though it has only been removed from the external tracking table. This gives a false sense that the system has been 'cleaned-up' but it has not been removed from the actual store (this is the main complaint from the user in issue ExternalTable.delete should not remove row on error #953). We can provide an additional utility to resolve this but that would be expensive to run as it would have to 'crawl' the entire store to find objects that don't exist within the external tracking table. By simply inserting it back when there is an exception, we are allowing the error to be more visible but most importantly reproducible.
schema.external[store].delete()
is not currently documented that it returns a list of the errors. Users are most likely unaware of this feature and therefore aren't using it properly. @zitrosolrac Could you open an issue on this in our datajoint-docs and reference it in ExternalTable.delete should not remove row on error #953? (Filing for now since our new team members haven't been oriented to the docs setup yet).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.insert1(row[0]) | |
self.insert1(row[0], skip_duplicates=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@guzman-raphael Got it! I'll open an issue on this in the datajoint-docs and add the necessary reference.
datajoint/external.py
Outdated
self._remove_external_file(external_path) | ||
except Exception as error: | ||
# adding row back into table after failed delete | ||
self.insert1(row[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.insert1(row[0]) | |
self.insert1(row[0], skip_duplicates=True) |
Co-authored-by: Raphael Guzman <[email protected]>
fix #953