Fix ZODB pickle corruption on python-3.7 #47

kedder · 2019-05-31T14:13:47Z

This is an attempt to fix a data corruption issue, that manifests in a
random byte replacing a correct one in a pickle, which makes a pickle unreadable.

I couldn't figure out the exact reason for it, but max_output_len in the fixed expression might become 0 under extreme conditions (when output_len is 0 and n is 1). The patch fixed the issue and was tested by our team for about a month.

mgedmin

Do you think it would be possible to write a test case for this?

_pickle_33.c was taken from the Python 3.3 standard library. Was the bug fixed in Python at some latter point? Should we report an upstream bug? Or find and backport an upstream patch? Or at least link to the relevant upstream bug/commit?

kedder · 2019-05-31T15:13:05Z

I'll try to repro it in a test, but it is pretty difficult - the bug was very hard to trigger, I suspect it happens on a very specific set of data.

I noticed that changing MAX_WRITE_BUF_SIZE constant also affects the issue. From what I can tell, C code tries to keep buffer size below MAX_WRITE_BUF_SIZE and if buffer exceeds it, it clears it and starts growing it again. This constant and its handling was removed in future versions (see https://github.com/python/cpython/blob/master/Modules/_pickle.c), so I suspect this bug is not applicable for future versions of python.

We tried to port _pickle_33.c from the further upstream versions, but it received too many changes to make it feasible.

Unfortunately I lost the original traceback that we were seeing. Python version of the pickle code (that will be enabled if you delete the _pickle.cpython-37m-x86_64-linux-gnu.so is not affected by this bug, so that's a temporary workaround.

Another changes that seems to fix the issue (at least for one particular data set I had at hand):

Changing WRITE_BUF_SIZE to be equal to MAX_WRITE_BUF_SIZE, so buffer doesn't grow
Changing suspicious (self->output_len + n) / 2 * 3 expression to (self->output_len + n) * 2 to avoid corner cases of buffer size not growing when self->output_len + n is equal to 1.
The one suggested in this MR, which I find to be the simplest.

kedder · 2019-05-31T15:23:18Z

Another detail: this bug produces error on reading pickles. I.e. this change doesn't fix the corrupted pickles that are already stored in ZODB.

I was able to fix one corrupted pickle by carefully inspecting contents with pickletools and fixing one byte in hex editor.

dwt · 2019-06-04T13:53:01Z

This is great!

At ZODB/issues/271 I reported a problem that after a zodbupdate run during the --pack phase we had reproducible explosions under python 3.7 that didn't happen on python 3.6.

This patch seems to fix that, i.e. I can no longer reproduce that explosion. Also the error I have seen looks quite similar, in that a single byte was written wrongly somewhere in the middle of a pickle (usually an append instruction was changed to a setitem instruction), which I also was able to fix by manually editing the pickle in question. (In my case though, pickles later in the ZODB had more problems that I wasn't able to work around).

Sadly the only way I can reliably reproduce it is zodbupdate-ing a 4.5 GB ZODB from python 2 to 3, and I have no clue how to even beginn to make a reproducible test case from that. :-( Sadly the DB is full of data I cannot share, so I cannot give it to somebody with more experience minimising the problem.

Still I hope that this helps supporting this fix in that it provides at least some evidence that it does something sensible.

dwt · 2019-06-04T13:54:36Z

If it helps, I can share a pickle object with the corruption privately to people I trust.

icemac · 2019-06-06T06:07:23Z

If is is too hard to write a test at least a change log entry would be welcome.

kedder · 2019-06-06T19:09:42Z

Funny thing, using some brute force I was able to repro the issue and confirm this PR actually fixes it:

>>> import io
>>> from zodbpickle import pickle
>>> inp = ['a'] * 34991
>>> f = io.BytesIO()
>>> pickle.dump(inp, f)
>>> len(f.getvalue())
70064
>>> f.seek(0)
0
>>> outp = pickle.load(f)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
_pickle.UnpicklingError: invalid load key, 'H'.

The 34991 is a magic number for (default) protocol 3. Other protocols are affected too, just with different list sizes. The error is not consistent - sometimes it is EOFError, sometimes exception is not raised, but returned unpickled value doesn't match original one.

Now, the script to find this case runs for several minutes, trying all lists of sizes up to 100K. This doesn't feel like a good fit for the otherwise quite fast test suite. Hardcoding this particular instance in a test doesn't seem to be very useful either.

What do you think, what would be the sanest way to release this?

mgedmin · 2019-06-06T19:27:34Z

Hardcoding this particular instance in a test doesn't seem to be very useful either.

I think it's fine -- and much better than not having any test case at all!

This is an attempt to fix a data corruption issue, that manifests in a random byte replacing a correct one in a pickle.

kedder · 2019-06-10T18:03:42Z

I've added the test and a changelog entry.

icemac

LGTM.

icemac · 2019-06-12T05:57:14Z

Released as 1.0.4 to PyPI, see https://pypi.org/project/zodbpickle/1.0.4/

mgedmin reviewed May 31, 2019

View reviewed changes

mgedmin mentioned this pull request May 31, 2019

packing a db fails on python3.7 with '_pickle.UnpicklingError: odd number of items for SETITEMS' zopefoundation/ZODB#271

Closed

Make sure buffer size is more than 0

9b300ef

This is an attempt to fix a data corruption issue, that manifests in a random byte replacing a correct one in a pickle.

kedder force-pushed the master branch from 4193367 to 9b300ef Compare June 10, 2019 18:02

kedder changed the title ~~Make sure buffer size is more than 0~~ Fix ZODB pickle corruption on python-3.7 Jun 11, 2019

icemac approved these changes Jun 11, 2019

View reviewed changes

kedder merged commit f27eafa into zopefoundation:master Jun 11, 2019

d-maurer mentioned this pull request Jun 18, 2019

ValueError: could not convert string to int, better type checks for OOBTree keys under Python 3? zopefoundation/ZODB#272

Open

mgedmin mentioned this pull request Aug 1, 2019

ValueError: insecure string pickle zopefoundation/ZODB#276

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix ZODB pickle corruption on python-3.7 #47

Fix ZODB pickle corruption on python-3.7 #47

kedder commented May 31, 2019

mgedmin left a comment

kedder commented May 31, 2019

kedder commented May 31, 2019

dwt commented Jun 4, 2019 •

edited

Loading

dwt commented Jun 4, 2019

icemac commented Jun 6, 2019

kedder commented Jun 6, 2019

mgedmin commented Jun 6, 2019

kedder commented Jun 10, 2019

icemac left a comment

icemac commented Jun 12, 2019

Fix ZODB pickle corruption on python-3.7 #47

Fix ZODB pickle corruption on python-3.7 #47

Conversation

kedder commented May 31, 2019

mgedmin left a comment

Choose a reason for hiding this comment

kedder commented May 31, 2019

kedder commented May 31, 2019

dwt commented Jun 4, 2019 • edited Loading

dwt commented Jun 4, 2019

icemac commented Jun 6, 2019

kedder commented Jun 6, 2019

mgedmin commented Jun 6, 2019

kedder commented Jun 10, 2019

icemac left a comment

Choose a reason for hiding this comment

icemac commented Jun 12, 2019

dwt commented Jun 4, 2019 •

edited

Loading