Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite Recursion during Unpickling a codecs Object #50644

Closed
ThomasH mannequin opened this issue Jul 1, 2009 · 10 comments
Closed

Infinite Recursion during Unpickling a codecs Object #50644

ThomasH mannequin opened this issue Jul 1, 2009 · 10 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@ThomasH
Copy link
Mannequin

ThomasH mannequin commented Jul 1, 2009

BPO 6395
Nosy @malemburg, @avassalotti, @serhiy-storchaka
Files
  • codecs_stream_delegating.patch
  • codecs_stream_delegating_2.patch
  • codecs_stream_forbid_pickling.patch: Forbid pickling and copying
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = None
    created_at = <Date 2009-07-01.15:46:26.758>
    labels = ['type-bug', 'library']
    title = 'Infinite Recursion during Unpickling a codecs Object'
    updated_at = <Date 2015-12-06.12:10:19.822>
    user = 'https://bugs.python.org/ThomasH'

    bugs.python.org fields:

    activity = <Date 2015-12-06.12:10:19.822>
    actor = 'serhiy.storchaka'
    assignee = 'serhiy.storchaka'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2009-07-01.15:46:26.758>
    creator = 'ThomasH'
    dependencies = []
    files = ['41184', '41215', '41217']
    hgrepos = []
    issue_num = 6395
    keywords = ['patch']
    message_count = 10.0
    messages = ['89985', '90180', '90258', '255540', '255756', '255761', '255763', '255774', '255785', '256011']
    nosy_count = 4.0
    nosy_names = ['lemburg', 'alexandre.vassalotti', 'ThomasH', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue6395'
    versions = ['Python 2.7', 'Python 3.4', 'Python 3.5', 'Python 3.6']

    Linked PRs

    @ThomasH
    Copy link
    Mannequin Author

    ThomasH mannequin commented Jul 1, 2009

    I recently ran into an infinite recursion trying to unpickle a
    codecs.StreamWriter object (I presume the issue would be the same for a
    StreamReader).

    Here is the end of the stack trace:

      File "/sw/lib/python2.5/codecs.py", line 330, in __getattr__   
        return getattr(self.stream, name)
      File "/sw/lib/python2.5/codecs.py", line 330, in __getattr__ 
        return getattr(self.stream, name) 
      File "/sw/lib/python2.5/codecs.py", line 330, in __getattr__ 
        return getattr(self.stream, name) 
    RuntimeError: maximum recursion depth exceeded

    The issue is the same under Python2.6 but the error output has changed
    (see http://bugs.python.org/issue5508).

    The problem is that the codecs module tries to delegate member lookup to
    the underlying stream. But after unpickling, "self.stream" is not
    defined, so the reference to self.stream in __getattr__ itself leads to
    an invocation of __getattr__ - hence the recursion loop.

    Using tools from the Pickle protocol, like __getstate__/setstate,
    could help degrade codecs objects gracefully during pickle/unpickle
    cycles. E.g. it might be enough to provide a dummy self.stream through
    __setstate__.

    @ThomasH ThomasH mannequin added type-crash A hard crash of the interpreter, possibly with a core dump stdlib Python modules in the Lib dir labels Jul 1, 2009
    @benjaminp benjaminp added type-feature A feature request or enhancement and removed type-crash A hard crash of the interpreter, possibly with a core dump labels Jul 1, 2009
    @malemburg
    Copy link
    Member

    I don't understand the use case for this.

    If the StreamWriter/Reader cannot pickle the underlying stream (which is
    probably always the case), why should the object itself be pickleable ?

    What we could do is implement .__get/setstate__() and have it raise an
    exception to inform the user of the problem.

    @ThomasH
    Copy link
    Mannequin Author

    ThomasH mannequin commented Jul 8, 2009

    Sounds good to me. The main intention of this bug is not to make codecs
    objects pickleable by all means (I don't know if this would be sensible
    and/or possible), but at least to have them degrade gracefully during
    pickling, particularly without infinite recursion. I've changed the bug
    title to reflect this.

    @ThomasH ThomasH mannequin changed the title Add Pickle Support to the codecs Module Infinite Recursion during Unpickling a codecs Object Jul 8, 2009
    @serhiy-storchaka
    Copy link
    Member

    Implementing only .__get/setstate__() doesn't fix all problem. We have implement also __getnewargs_ex__(). But implemented __getnewargs_ex__() has priority over __getnewargs__() implemented in subclasses.

    And may be there are problems with other optional special methods that are incorrectly delegated to the stream in codecs IO classes.

    I think more reliable way is to disallow delegating all special (or may be even private) methods. Here is a patch.

    @serhiy-storchaka serhiy-storchaka self-assigned this Nov 28, 2015
    @serhiy-storchaka serhiy-storchaka added type-bug An unexpected behavior, bug, or error and removed type-feature A feature request or enhancement labels Nov 28, 2015
    @serhiy-storchaka
    Copy link
    Member

    If the StreamWriter/Reader cannot pickle the underlying stream (which is
    probably always the case), why should the object itself be pickleable ?

    io.BytesIO() and io.StringIO() are pickleable.

    @malemburg
    Copy link
    Member

    On 02.12.2015 20:16, Serhiy Storchaka wrote:

    Serhiy Storchaka added the comment:

    > If the StreamWriter/Reader cannot pickle the underlying stream (which is
    > probably always the case), why should the object itself be pickleable ?

    io.BytesIO() and io.StringIO() are pickleable.

    Ok, but I still don't see the use case :-)

    I think all we need to do is add a .__reduce__()
    method to StreamWriter and StreamReader, which then
    raises a PickleError.

    Example:

    >>> import sys, codecs, pickle
    >>> r = codecs.getreader('latin-1')
    >>> class MyReader(r):
    ...    def __reduce__(self, *args):
    ...       raise pickle.PickleError
    ...
    >>> s = MyReader(sys.stdin)
    >>> pickle.dumps(s)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "<stdin>", line 3, in __reduce__
    _pickle.PickleError
    > <stdin>(3)__reduce__()

    @serhiy-storchaka
    Copy link
    Member

    Added tests for pickling and deepcopying.

    @serhiy-storchaka
    Copy link
    Member

    I think all we need to do is add a .__reduce__()
    method to StreamWriter and StreamReader, which then
    raises a PickleError.

    Rather TypeError. Yes, it is the least that we should to do in maintained releases. If codecs_stream_delegating_2.patch is considered too drastic for bugfix. But this can be only a part of problem. May be there are issues with other optional special methods. And adding __reduce_ex__ breaks subclass pickleability if it was implemented with __getstate__ and __getnewargs__.

    Here is a patch for this way.

    @malemburg
    Copy link
    Member

    On 02.12.2015 21:29, Serhiy Storchaka wrote:

    Serhiy Storchaka added the comment:

    > I think all we need to do is add a .__reduce__()
    > method to StreamWriter and StreamReader, which then
    > raises a PickleError.

    Rather TypeError. Yes, it is the least that we should to do in maintained releases. If codecs_stream_delegating_2.patch is considered too drastic for bugfix. But this can be only a part of problem. May be there are issues with other optional special methods. And adding __reduce_ex__ breaks subclass pickleability if it was implemented with __getstate__ and __getnewargs__.

    Here is a patch for this way.

    Thanks. I think using __reduce__ instead of __reduce_ex__ is
    safer, since subclasses will more likely override __reduce__
    than __reduce_ex__.

    The other approach will have too many backwards incompatible side
    effects, e.g. the repr() over StreamReader instances would change.

    @serhiy-storchaka
    Copy link
    Member

    There is also a problem with deepcopying:

    >>> class MemReader:
    ...     def __init__(self, data):
    ...         self.buf = memoryview(data).cast('B')
    ...     def read(self, size=-1):
    ...         if size < 0:
    ...             size = len(self.buf)
    ...         res = bytes(self.buf[:size])
    ...         self.buf = self.buf[size:]
    ...         return res
    ...     def close():
    ...         pass
    ...     def __deepcopy__(self, memo):
    ...         return MemReader(self.buf)
    ...     def __reduce__(self):
    ...         return MemReader, (bytes(self.buf),)
    ... 
    >>> import codecs, copy
    >>> s1 = codecs.getreader('utf-8')(MemReader(b'abcd'))
    >>> s2 = copy.deepcopy(s1)
    >>> s1.read()
    'abcd'
    >>> s2.read()
    b'abcd'
    >>> s2
    <__main__.MemReader object at 0xb701988c>

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Sep 9, 2023
    Attempts to pickle or create a shallow or deep copy of codecs streams
    now raise a TypeError.
    
    Previously, stream pickling produced invalid data, which attempts to read
    resulted in a RecursionError, as well as attempts to create a copy of
    the stream.
    serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Sep 9, 2023
    Attempts to pickle or create a shallow or deep copy of codecs streams
    now raise a TypeError.
    
    Previously, stream pickling produced invalid data, which attempts to read
    resulted in a RecursionError, as well as attempts to create a copy of
    the stream.
    miss-islington pushed a commit to miss-islington/cpython that referenced this issue Sep 10, 2023
    Attempts to pickle or create a shallow or deep copy of codecs streams
    now raise a TypeError.
    
    Previously, copying failed with a RecursionError, while pickling
    produced wrong results that eventually caused unpickling to fail with
    a RecursionError.
    (cherry picked from commit d6892c2)
    
    Co-authored-by: Serhiy Storchaka <[email protected]>
    serhiy-storchaka added a commit that referenced this issue Sep 10, 2023
    Attempts to pickle or create a shallow or deep copy of codecs streams
    now raise a TypeError.
    
    Previously, copying failed with a RecursionError, while pickling
    produced wrong results that eventually caused unpickling to fail with
    a RecursionError.
    miss-islington pushed a commit to miss-islington/cpython that referenced this issue Sep 10, 2023
    Attempts to pickle or create a shallow or deep copy of codecs streams
    now raise a TypeError.
    
    Previously, copying failed with a RecursionError, while pickling
    produced wrong results that eventually caused unpickling to fail with
    a RecursionError.
    (cherry picked from commit d6892c2)
    
    Co-authored-by: Serhiy Storchaka <[email protected]>
    serhiy-storchaka added a commit that referenced this issue Sep 10, 2023
    …9232)
    
    Attempts to pickle or create a shallow or deep copy of codecs streams
    now raise a TypeError.
    
    Previously, copying failed with a RecursionError, while pickling
    produced wrong results that eventually caused unpickling to fail with
    a RecursionError.
    (cherry picked from commit d6892c2)
    
    Co-authored-by: Serhiy Storchaka <[email protected]>
    Yhg1s pushed a commit that referenced this issue Oct 2, 2023
    gh-50644: Forbid pickling of codecs streams (GH-109180)
    
    Attempts to pickle or create a shallow or deep copy of codecs streams
    now raise a TypeError.
    
    Previously, copying failed with a RecursionError, while pickling
    produced wrong results that eventually caused unpickling to fail with
    a RecursionError.
    (cherry picked from commit d6892c2)
    
    Co-authored-by: Serhiy Storchaka <[email protected]>
    @github-project-automation github-project-automation bot moved this from In Progress to Done in Codecs and encodings issues Oct 28, 2023
    @github-project-automation github-project-automation bot moved this from In Progress to Done in Pickle and copy issues 🥒 Oct 28, 2023
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    Status: Done
    Development

    No branches or pull requests

    4 participants