Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiprocessing Queue leaks a file descriptor associated with the pipe writer (#33081 still a problem) #86918

Open
crazycasta mannequin opened this issue Dec 26, 2020 · 8 comments
Labels
3.7 (EOL) end of life 3.8 (EOL) end of life 3.9 only security fixes 3.10 only security fixes stdlib Python modules in the Lib dir topic-multiprocessing type-bug An unexpected behavior, bug, or error

Comments

@crazycasta
Copy link
Mannequin

crazycasta mannequin commented Dec 26, 2020

BPO 42752
Nosy @pitrou, @mattip, @graingert, @applio
Files
  • test_queue.py
  • queue_close_write.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2020-12-26.23:03:41.032>
    labels = ['3.8', '3.9', '3.10', 'performance', '3.7', 'library']
    title = 'multiprocessing Queue leaks a file descriptor associated with the pipe writer (#33081 still a problem)'
    updated_at = <Date 2022-02-21.11:50:18.407>
    user = 'https://bugs.python.org/crazycasta'

    bugs.python.org fields:

    activity = <Date 2022-02-21.11:50:18.407>
    actor = 'vstinner'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2020-12-26.23:03:41.032>
    creator = 'crazycasta'
    dependencies = []
    files = ['49701', '49713']
    hgrepos = []
    issue_num = 42752
    keywords = ['patch']
    message_count = 5.0
    messages = ['383832', '384149', '384151', '384737', '388632']
    nosy_count = 5.0
    nosy_names = ['pitrou', 'mattip', 'graingert', 'crazycasta', 'davin']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = None
    status = 'open'
    superseder = None
    type = 'resource usage'
    url = 'https://bugs.python.org/issue42752'
    versions = ['Python 3.6', 'Python 3.7', 'Python 3.8', 'Python 3.9', 'Python 3.10']

    Linked PRs

    @crazycasta
    Copy link
    Mannequin Author

    crazycasta mannequin commented Dec 26, 2020

    Didn't feel like necroing bpo-33081, but this is basically that problem. The trouble is the cleanup that appeared to fix bpo-33081 only kicks in once something has been put in the queue. So if for instance a Process function puts something in the queue and the parent gets it, then calls q.close() the writer on the parent side doesn't get culled until the object does. This is particularly a problem for PyPy and isn't exactly great for any weird corner cases if anyone holds onto Queue objects after they're closed for any reason (horders!).

    Attached file test_queue.py is an example of how to trigger this problem. Run it without a command line argument "python test_queue.py" and it won't crash (though it will take a very long time to complete). Run with an argument "python test_queue.py fail" and it will fail once you run out of file descriptors (one leaked per queue).

    My suggestion on how to handle this is to set self._close to something that will close self._writer. Then, when _start_thread is called, instead of directly passing the self._writer.close object, pass a small function that will switch out self._close to the Finalize method used later on and return self._writer. Finally, inside _feed, use this method to get the _writer object and wrap the outer while 1 with a contextlib.closer on this object.

    This is a fair bit of stitching things together here and there so let me know if anyone has any suggestions on this before I get started.

    @crazycasta crazycasta mannequin added 3.7 (EOL) end of life 3.8 (EOL) end of life 3.9 only security fixes 3.10 only security fixes stdlib Python modules in the Lib dir performance Performance or resource usage labels Dec 26, 2020
    @crazycasta
    Copy link
    Mannequin Author

    crazycasta mannequin commented Jan 1, 2021

    Well, having not heard anything I decided to just make a patch and throw it up. Here it is. This includes a test that will fail with the old version and passes once patched as well as the patch to the queue code itself.

    Worth noting, the CleanExchange class is used because simpler things like using a closure to pass the exchange mechanism hold a reference to the Queue one way or another that is difficult/impossible to kill. This is because the intermediate thread mechanisms hold a reference to all the arguments that are passed to the run function. CleanExchange allows an indirect reference to be passed and for the reference to Queue to be None'd out.

    @mattip
    Copy link
    Contributor

    mattip commented Jan 1, 2021

    Just to be clear, here is the code from the test (how do you format a code block here?) that demonstrates the writer is not closed when nothing is put into the queue

    $ python3
    Python 3.8.6 | packaged by conda-forge | (default, Oct  7 2020, 19:08:05) 
    [GCC 7.5.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import multiprocessing
    >>> q = multiprocessing.Queue()
    >>> q.close()
    >>> q.join_thread()
    >>> q._reader.closed
    True
    >>> q._writer.closed
    False
    >>> 
    
    

    And the changed behaviour to close the writer if the queue is used

    >>> q = multiprocessing.Queue()
    >>> q.put(1)
    >>> q.get()
    1
    >>> q.close()
    >>> q.join_thread()
    >>> q._reader.closed
    True
    >>> q._writer.closed
    True
    
    

    @mattip
    Copy link
    Contributor

    mattip commented Jan 9, 2021

    In the expert index https://devguide.python.org/experts/ it lists @davin, @pitrou as referrents for multiprocessing. Adding then to the Nosy list

    @mattip
    Copy link
    Contributor

    mattip commented Mar 13, 2021

    @CrazyCasta could you turn your patches into a PR? I am not sure how to get some eyes on this, but certainly the test is useful to prove the problem still exits

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @iritkatriel iritkatriel added type-bug An unexpected behavior, bug, or error and removed performance Performance or resource usage labels Aug 24, 2022
    @mattip
    Copy link
    Contributor

    mattip commented Nov 21, 2022

    Ping. Did this ever become a PR?

    @serhiy-storchaka
    Copy link
    Member

    close() can be called before join_thread(). Some threads can still use the queue and try to write into it. Closing a writer just before some thread is going to write into it can cause warnings, and in worst case writing into wrong file descriptor. See #109370.

    @LorewalkerZhou
    Copy link

    close() can be called before join_thread(). Some threads can still use the queue and try to write into it. Closing a writer just before some thread is going to write into it can cause warnings, and in worst case writing into wrong file descriptor. See #109370.

    When multiprocessin.queue is closed without calling put, Queue._therad and Queue._jointhread still is None. Nothing will happen when calling join_thread.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 (EOL) end of life 3.9 only security fixes 3.10 only security fixes stdlib Python modules in the Lib dir topic-multiprocessing type-bug An unexpected behavior, bug, or error
    Projects
    Status: No status
    Development

    No branches or pull requests

    5 participants