-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ZODB pickle corruption on python-3.7 #47
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it would be possible to write a test case for this?
_pickle_33.c was taken from the Python 3.3 standard library. Was the bug fixed in Python at some latter point? Should we report an upstream bug? Or find and backport an upstream patch? Or at least link to the relevant upstream bug/commit?
I'll try to repro it in a test, but it is pretty difficult - the bug was very hard to trigger, I suspect it happens on a very specific set of data. I noticed that changing We tried to port _pickle_33.c from the further upstream versions, but it received too many changes to make it feasible. Unfortunately I lost the original traceback that we were seeing. Python version of the pickle code (that will be enabled if you delete the Another changes that seems to fix the issue (at least for one particular data set I had at hand):
|
Another detail: this bug produces error on reading pickles. I.e. this change doesn't fix the corrupted pickles that are already stored in ZODB. I was able to fix one corrupted pickle by carefully inspecting contents with |
This is great! At ZODB/issues/271 I reported a problem that after a This patch seems to fix that, i.e. I can no longer reproduce that explosion. Also the error I have seen looks quite similar, in that a single byte was written wrongly somewhere in the middle of a pickle (usually an append instruction was changed to a setitem instruction), which I also was able to fix by manually editing the pickle in question. (In my case though, pickles later in the ZODB had more problems that I wasn't able to work around). Sadly the only way I can reliably reproduce it is Still I hope that this helps supporting this fix in that it provides at least some evidence that it does something sensible. |
If it helps, I can share a pickle object with the corruption privately to people I trust. |
If is is too hard to write a test at least a change log entry would be welcome. |
Funny thing, using some brute force I was able to repro the issue and confirm this PR actually fixes it: >>> import io
>>> from zodbpickle import pickle
>>> inp = ['a'] * 34991
>>> f = io.BytesIO()
>>> pickle.dump(inp, f)
>>> len(f.getvalue())
70064
>>> f.seek(0)
0
>>> outp = pickle.load(f)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
_pickle.UnpicklingError: invalid load key, 'H'. The 34991 is a magic number for (default) protocol 3. Other protocols are affected too, just with different list sizes. The error is not consistent - sometimes it is Now, the script to find this case runs for several minutes, trying all lists of sizes up to 100K. This doesn't feel like a good fit for the otherwise quite fast test suite. Hardcoding this particular instance in a test doesn't seem to be very useful either. What do you think, what would be the sanest way to release this? |
I think it's fine -- and much better than not having any test case at all! |
This is an attempt to fix a data corruption issue, that manifests in a random byte replacing a correct one in a pickle.
I've added the test and a changelog entry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Released as 1.0.4 to PyPI, see https://pypi.org/project/zodbpickle/1.0.4/ |
This is an attempt to fix a data corruption issue, that manifests in a
random byte replacing a correct one in a pickle, which makes a pickle unreadable.
I couldn't figure out the exact reason for it, but
max_output_len
in the fixed expression might become 0 under extreme conditions (whenoutput_len
is 0 andn
is 1). The patch fixed the issue and was tested by our team for about a month.