-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiprocessing file lock hangs on certain filesystems #90
Comments
Yes! Thanks for raising this issue. It makes complete sense. Thoughts:
|
Sounds great! I'll put a PR together that, at a minimum, adds a setting 'use_file_lock' with a default of True, but when False runs the build without the lock. Regarding the comments about automatic detection and exception raising: On our filesystem, the nature of the error is that filelock silently fails to acquire a lock on any process. The failure to acquire a lock is indistinguishable (for all but one process) from the correctly-working case where a single process acquires the lock successfully and the rest are waiting for the build to finish, at least until the timeout period expires. It would be easy to solve this if the processes could communicate and realize that none of them are acquiring the lock, but I'm not sure that is reasonable. As a result, here are the communication-free possibilities that I can think of for automatic detection / exception raising regarding the lock:
I'm leaning towards the second simpler solution that just adds information to the lock timeout error and adds a README section about file locking. It would immediately make the latest version of the package usable (for the correct choice of global settings) on these filesystems. Additional PRs could implement the automatic locking detection on top of this. |
Sounds great!
Seems very reasonable! |
Thanks!! |
I think this is the issue which has now started failing debian CI tests at https://ci.debian.net/packages/c/cppimport/ The PR will be very welcome. |
Apologies for letting this sleep so long - I've opened a PR with an option to disable the file lock and updated the tests. The file lock remains on by default, so the functionality of any working code is unchanged. All tests except for the multiprocessing test now disable the file lock. The multiprocessing test is disabled by default, but can be run with the invocation Feedback very welcome, some of these choices were opinionated. |
The patch looks good to me (but I didn't test it yet patched into my code). Disabling the multiprocessing test by default would keep CI testing stable. One question for the |
@drew-parsons Not sure if there's a universal way to check across systems - but at least on linux, this was helpful using touch testfile
flock ./testfile true && echo ok || echo nok I tested this snippet on a DVS filesystem that does not support locking (result below): vbharadw> flock ./testfile true && echo ok || echo nok
flock: ./testfile: Unknown error 524
nok And on a filesystem that does support locking:
|
Easy enough to use in a CI script, thanks! |
Merged- thanks @tbenthompson for the review (and overall maintenance of this package) and @drew-parsons for the nudge! |
I recently noticed that cppimport version 22.7.12+ hangs during the build process (our computing center NERSC recently reconfigured their filesystem). The culprit turned out to be a file lock in introduced in #67 and merged in #71 , which enables multiple processes to use a cppimport-compiled module without stepping on each other's toes. The lock is on line
https://github.com/tbenthompson/cppimport/blob/main/cppimport/importer.py#L49
.Unfortunately, our new filesystem configuration does not support file locking as it uses a particular data virtualization service (DVS) - see here for documentation of this issue: https://docs.nersc.gov/development/languages/python/parallel-python/#parallel-io-with-h5py. I solved my problem by rolling back to version 22.5.11, right before the file lock was introduced.
Other filesystems may suffer from similar restrictions. Is it worth adding a setting for cppimport that enables / disables the file lock for multiprocessing build support? For what it's worth, I'm using cppimport in code called from multiple processes, and I protect the build process manually outside the package. I really like the fix introduced by the lock and think it would address the need on most filesystems - but it would be nice to turn it off.
If there is interest in adding the setting, I can write a fix and propose a PR. Feedback solicited!
The text was updated successfully, but these errors were encountered: