Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Starting new thread during finalization leads to call to PyMem_Free without holding the GIL #109795

Closed
chgnrdv opened this issue Sep 23, 2023 · 5 comments · Fixed by #109808
Closed
Labels
type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@chgnrdv
Copy link
Contributor

chgnrdv commented Sep 23, 2023

Crash report

What happened?

Since #109135, thread_run checks if it is called during finalization, and in this case it frees thread bootstate in thread_bootstate_free and returns. The problem is that PyMem_Free shouldn't be called if GIL is not held.

Repro and error message (just in case):

import os
import random
import signal
import subprocess
import sys
import time

script = """
import threading
while True:
    t = threading.Thread()
    t.start()
    t.join()
"""

while True:
    p = subprocess.Popen([sys.executable, '-c', script], stderr=subprocess.PIPE)
    time.sleep(random.random())
    os.kill(p.pid, signal.SIGINT)
    _, err = p.communicate()
    if p.returncode == -signal.SIGABRT:
        print(err.decode('utf-8'))
        break
Traceback (most recent call last):
  File "<string>", line 5, in <module>
  File "/home/radislav/projects/cpython/Lib/threading.py", line 983, in start
    self._started.wait()
  File "/home/radislav/projects/cpython/Lib/threading.py", line 641, in wait
    signaled = self._cond.wait(timeout)
               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/radislav/projects/cpython/Lib/threading.py", line 341, in wait
    waiter.acquire()
KeyboardInterrupt
Fatal Python error: _PyMem_DebugFree: Python memory allocator called without holding the GIL
Python runtime state: finalizing (tstate=0x0000561b61b67338)

Thread 0x00007f8f890b8280 (most recent call first):
  <no Python frame>

cc @vstinner

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux

Output from running 'python -VV' on the command line:

Python 3.13.0a0 (heads/main:3e8fcb7df7, Sep 23 2023, 01:43:45) [GCC 10.2.1 20210110]

Linked PRs

@chgnrdv chgnrdv added the type-crash A hard crash of the interpreter, possibly with a core dump label Sep 23, 2023
@vstinner
Copy link
Member

vstinner commented Sep 24, 2023

thread_PyThread_start_new_thread() can easily use PyMem_RawMalloc() instead of PyMem_Malloc(). Do you want to propose a fix?

@chgnrdv
Copy link
Contributor Author

chgnrdv commented Sep 24, 2023

@vstinner, yes, I'll submit a PR shortly :)

@vstinner
Copy link
Member

Ah. After many attempts, I managed to reproduce the bug without the fix:

$ ./python repro.py 
...............failed with exit code -6
Fatal Python error: _PyMem_DebugFree: Python memory allocator called without holding the GIL
Python runtime state: finalizing (tstate=0x0000000000b062f8)

Thread 0x00007fd52873a740 (most recent call first):
  <no Python frame>

I modified the reproducer script:

import os
import random
import signal
import subprocess
import sys
import time

script = """
try:
    import threading
    while True:
        t = threading.Thread()
        t.start()
        t.join()
except KeyboardInterrupt:
    pass
"""

line = 0

while True:
    if line % 40 == 0:
        sys.stdout.write(f"\n{line} ")
    sys.stdout.write(".")
    sys.stdout.flush()
    line += 1

    p = subprocess.Popen([sys.executable, '-c', script], stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
    time.sleep(0.10 + random.random()*0.5)
    os.kill(p.pid, signal.SIGINT)
    with p:
        try:
            out = p.communicate(timeout=1.0)[0]
        except subprocess.TimeoutExpired:
            sys.stdout.write("*** timeout ***")
            sys.stdout.flush()
            p.kill()
            continue
    if p.returncode:
        print(f"failed with exit code {p.returncode}")
        print(out.decode('utf-8'))
        break

Sometimes, I get other bugs like:

$ ./python repro.py 

0 ...................................failed with exit code 1
Traceback (most recent call last):
  File "/home/vstinner/python/main/Lib/threading.py", line 641, in wait
    signaled = self._cond.wait(timeout)
               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vstinner/python/main/Lib/threading.py", line 337, in wait
    saved_state = self._release_save()
                  ^^^^^^^^^^^^^^^^^^^^
  File "/home/vstinner/python/main/Lib/threading.py", line 295, in _release_save
    self._lock.release()           # No state to save
    ^^^^^^^^^^^^^^^^^^^^
KeyboardInterrupt

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 6, in <module>
  File "/home/vstinner/python/main/Lib/threading.py", line 983, in start
    self._started.wait()
  File "/home/vstinner/python/main/Lib/threading.py", line 638, in wait
    with self._cond:
  File "/home/vstinner/python/main/Lib/threading.py", line 289, in __exit__
    return self._lock.__exit__(*args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: release unlocked lock

@vstinner
Copy link
Member

Thanks for the reproducer @chgnrdv!

The bug is hard to trigger :-( But apparently, I cannot reproduce the bug anymore. Good :-)

vstinner added a commit that referenced this issue Sep 25, 2023
…te usin… (#109852)

gh-109795: `_thread.start_new_thread`: allocate thread bootstate using raw memory allocator (#109808)

(cherry picked from commit 1b8f236)

Co-authored-by: Radislav Chugunov <[email protected]>
@chgnrdv
Copy link
Contributor Author

chgnrdv commented Sep 25, 2023

@vstinner thank you for review :)

Yep, it takes precise timing for interrupt to make Python finalize before new thread will be started in thread_PyThread_start_new_thread. The repro I attached is fittable for my machine, but may not work on others.

csm10495 pushed a commit to csm10495/cpython that referenced this issue Sep 28, 2023
vstinner pushed a commit to vstinner/cpython that referenced this issue Oct 4, 2023
…e using raw memory allocator (python#109808)

(cherry picked from commit 1b8f236)
vstinner added a commit that referenced this issue Oct 4, 2023
) (#110342)

* gh-108987: Fix _thread.start_new_thread() race condition (#109135)

Fix _thread.start_new_thread() race condition. If a thread is created
during Python finalization, the newly spawned thread now exits
immediately instead of trying to access freed memory and lead to a
crash.

thread_run() calls PyEval_AcquireThread() which checks if the thread
must exit. The problem was that tstate was dereferenced earlier in
_PyThreadState_Bind() which leads to a crash most of the time.

Move _PyThreadState_CheckConsistency() from thread_run() to
_PyThreadState_Bind().

(cherry picked from commit 517cd82)

* gh-109795: `_thread.start_new_thread`: allocate thread bootstate using raw memory allocator (#109808)

(cherry picked from commit 1b8f236)

---------

Co-authored-by: Radislav Chugunov <[email protected]>
Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-crash A hard crash of the interpreter, possibly with a core dump
Projects
None yet
2 participants