-
Notifications
You must be signed in to change notification settings - Fork 421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release GIL in deltalake.write_deltalake #2234
Comments
@mrocklin which engine are you using to write with? With the rust engine I don't think we are releasing the GIL but shouldn't be to tricky to add. It's there for MERGE |
It looks like this maybe also happens for compact (and presumably other operations) |
Whatever's default. My code looks like this: deltalake.write_deltalake(
outfile,
df,
mode="append",
storage_options=STORAGE_OPTIONS,
partition_by="date",
) |
The default uses the PyArrow engine. You could try compiling a version where the called method in rust is wrapped inside py.allow_threads, feel free to open a PR :) |
Alas I'm not familiar with the Rust compilation/build process, so it's unlikely that I'll do this work myself (hopefully it's ok to raise an issue without volunteering to do the work myself). |
mhh I tried:
so you suggest to add this line |
@franz101 you can do the same thing as done for merge: https://github.com/delta-io/delta-rs/blob/main/python%2Fsrc%2Flib.rs#L520-L541 |
Ah nice, is there any benefit of making it up to the user if the function should block GIL or not? |
Almost certainly not is my guess. If the code genuinely creates/destroys Python objects then it should not release the GIL. If it doesn't screw around with Python objects at all (my hope) then it should definitely release the GIL. FWIW I don't know of any other project that makes GIL-holding optional. It's either a good idea or it isn't, entirely dependent on the wrapped code. |
(also, thank you for looking into this!) |
I opened the PR tests are passing fine. My next PR if the demand is there to use custom PK SK for DynamoDB since for DynamoDB latency is lower if using an existing database |
# Description Release GIL in deltalake.write_deltalake by wrapping it in py.allow_threads # Related Issue(s) - closes #2234 # Documentation
Thank you @franz101 and @ion-elgreco !
Ha, yes, the 1TRC is fun. A non-trivial amount of time is listing the parquet files, so adding delta into the mix would probably shave off a non-trivial amount of time :) Should I raise another issue for the functions other than write_deltalake? |
I'm running
deltalake.write_deltalake
many times in parallel and noticing that my Python process is freezing up a bit. I suspect that this function has a long period where it doesn't release the GIL. Is this true?If so, I suspect that it's likely accidental, and could be easily changed. I'm not familiar with the rust-Python tooling, but most Python binding systems make this pretty easy.
The text was updated successfully, but these errors were encountered: