Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_threading fails and crashes when rerun #108987

Closed
serhiy-storchaka opened this issue Sep 6, 2023 · 6 comments
Closed

test_threading fails and crashes when rerun #108987

serhiy-storchaka opened this issue Sep 6, 2023 · 6 comments
Labels
OS-windows type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@serhiy-storchaka
Copy link
Member

serhiy-storchaka commented Sep 6, 2023

Crash report

First, BarrierTests.test_default_timeout in test_threading fails, producing also several unraisable exception messages. They are produced even during running few following tests.

https://github.com/python/cpython/actions/runs/6094370280/job/16535778004#step:5:103

test_default_timeout (test.test_threading.BarrierTests.test_default_timeout)
Test the barrier's default timeout ... Warning -- Unraisable exception
Exception ignored in thread started by: <function Bunch.__init__.<locals>.task at 0x053F5270>
Traceback (most recent call last):
  File "D:\a\cpython\cpython\Lib\test\lock_tests.py", line 48, in task
    f()
  File "D:\a\cpython\cpython\Lib\test\lock_tests.py", line 1020, in f
    i = barrier.wait()
        ^^^^^^^^^^^^^^
  File "D:\a\cpython\cpython\Lib\threading.py", line 709, in wait
    self._wait(timeout)
  File "D:\a\cpython\cpython\Lib\threading.py", line 749, in _wait
    raise BrokenBarrierError
threading.BrokenBarrierError: 
ERROR
test_repr (test.test_threading.BarrierTests.test_repr) ... Warning -- Unraisable exception
Exception ignored in thread started by: <function Bunch.__init__.<locals>.task at 0x053F5270>
Traceback (most recent call last):
  File "D:\a\cpython\cpython\Lib\test\lock_tests.py", line 48, in task
    f()
  File "D:\a\cpython\cpython\Lib\test\lock_tests.py", line 1020, in f
    i = barrier.wait()
        ^^^^^^^^^^^^^^
  File "D:\a\cpython\cpython\Lib\threading.py", line 709, in wait
    self._wait(timeout)
  File "D:\a\cpython\cpython\Lib\threading.py", line 749, in _wait
    raise BrokenBarrierError
threading.BrokenBarrierError: 
Warning -- Unraisable exception
Exception ignored in thread started by: <function Bunch.__init__.<locals>.task at 0x053F5270>
Traceback (most recent call last):
  File "D:\a\cpython\cpython\Lib\test\lock_tests.py", line 48, in task
    f()
  File "D:\a\cpython\cpython\Lib\test\lock_tests.py", line 1020, in f
    i = barrier.wait()
        ^^^^^^^^^^^^^^
  File "D:\a\cpython\cpython\Lib\threading.py", line 709, in wait
    self._wait(timeout)
  File "D:\a\cpython\cpython\Lib\threading.py", line 749, in _wait
    raise BrokenBarrierError
threading.BrokenBarrierError: 
Warning -- Unraisable exception
Exception ignored in thread started by: <function Bunch.__init__.<locals>.task at 0x053F5270>
Traceback (most recent call last):
  File "D:\a\cpython\cpython\Lib\test\lock_tests.py", line 48, in task
    f()
  File "D:\a\cpython\cpython\Lib\test\lock_tests.py", line 1020, in f
    i = barrier.wait()
        ^^^^^^^^^^^^^^
  File "D:\a\cpython\cpython\Lib\threading.py", line 700, in wait
    self._enter() # Block while the barrier drains.
    ^^^^^^^^^^^^^
  File "D:\a\cpython\cpython\Lib\threading.py", line 724, in _enter
    raise BrokenBarrierError
threading.BrokenBarrierError: 
ok

After rerunning test_threading it crashes.

https://github.com/python/cpython/actions/runs/6094370280/job/16535778004#step:5:1067

0:17:28 Re-running 2 failed tests in verbose mode in subprocesses
0:17:28 Run tests in parallel using 4 child processes (timeout: 20 min, worker timeout: 25 min)
0:17:30 [1/2/1] test_threading process crashed (Exit code 3)
Re-running test_threading in verbose mode (matching: test_default_timeout)
test_default_timeout (test.test_threading.BarrierTests.test_default_timeout)
Test the barrier's default timeout ... Warning -- Unraisable exception
Exception ignored in thread started by: <function Bunch.__init__.<locals>.task at 0x052196F0>
Traceback (most recent call last):
  File "D:\a\cpython\cpython\Lib\test\lock_tests.py", line 48, in task
    f()
  File "D:\a\cpython\cpython\Lib\test\lock_tests.py", line 1020, in f
    i = barrier.wait()
        ^^^^^^^^^^^^^^
  File "D:\a\cpython\cpython\Lib\threading.py", line 709, in wait
    self._wait(timeout)
  File "D:\a\cpython\cpython\Lib\threading.py", line 749, in _wait
    raise BrokenBarrierError
threading.BrokenBarrierError: 
ERROR

======================================================================
ERROR: test_default_timeout (test.test_threading.BarrierTests.test_default_timeout)
Test the barrier's default timeout
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\a\cpython\cpython\Lib\test\lock_tests.py", line 1025, in test_default_timeout
    self.run_threads(f)
  File "D:\a\cpython\cpython\Lib\test\lock_tests.py", line 854, in run_threads
    f()
  File "D:\a\cpython\cpython\Lib\test\lock_tests.py", line 1020, in f
    i = barrier.wait()
        ^^^^^^^^^^^^^^
  File "D:\a\cpython\cpython\Lib\threading.py", line 709, in wait
    self._wait(timeout)
  File "D:\a\cpython\cpython\Lib\threading.py", line 747, in _wait
    raise BrokenBarrierError
threading.BrokenBarrierError

----------------------------------------------------------------------
Ran 1 test in 0.376s

FAILED (errors=1)
test test_threading failed

{"test_name": "test_threading", "state": "FAILED", "duration": 0.6816820999999891, "xml_data": null, "stats": {"tests_run": 1, "failures": 0, "skipped": 0}, "errors": [["test_default_timeout (test.test_threading.BarrierTests.test_default_timeout)", "Traceback (most recent call last):\n  File \"D:\\a\\cpython\\cpython\\Lib\\test\\lock_tests.py\", line 1025, in test_default_timeout\n    self.run_threads(f)\n  File \"D:\\a\\cpython\\cpython\\Lib\\test\\lock_tests.py\", line 854, in run_threads\n    f()\n  File \"D:\\a\\cpython\\cpython\\Lib\\test\\lock_tests.py\", line 1020, in f\n    i = barrier.wait()\n        ^^^^^^^^^^^^^^\n  File \"D:\\a\\cpython\\cpython\\Lib\\threading.py\", line 709, in wait\n    self._wait(timeout)\n  File \"D:\\a\\cpython\\cpython\\Lib\\threading.py\", line 747, in _wait\n    raise BrokenBarrierError\nthreading.BrokenBarrierError\n"]], "failures": [], "__test_result__": "TestResult"}
A�s�s�e�r�t�i�o�n� �f�a�i�l�e�d�:� �t�s�t�a�t�e�_�i�s�_�a�l�i�v�e�(�t�s�t�a�t�e�)� �&�&� �!�t�s�t�a�t�e�-�>�_�s�t�a�t�u�s�.�b�o�u�n�d�,� �f�i�l�e� �D�:�\�a�\�c�p�y�t�h�o�n�\�c�p�y�t�h�o�n�\�P�y�t�h�o�n�\�p�y�s�t�a�t�e�.�c�,� �l�i�n�e� �2�4�5�
�Fatal Python error: Aborted

Thread 0x0000186c (most recent call first):
  <no Python frame>
1 test failed again:
    test_threading

Note also UTF-16 in logs, but this is other issue.

It only happened in Windows (x86) run, the run on Windows (x64) passed successfully, as well as on other platforms. It may be 32-bit Windows specific or random.

Linked PRs

@serhiy-storchaka serhiy-storchaka added OS-windows type-crash A hard crash of the interpreter, possibly with a core dump labels Sep 6, 2023
@terryjreedy
Copy link
Member

Running just python -m test -v test_threading (main) on my Win10 gives different newish and reproducible failures.

======================================================================
FAIL: test_daemon_threads_not_allowed (test.test_threading.SubinterpThreadingTests.test_daemon_threads_not_allowed) [via Thread()]
----------------------------------------------------------------------
Traceback (most recent call last):
  File "f:\dev\3x\Lib\test\test_threading.py", line 1367, in test_daemon_threads_not_allowed
    err = self._check_allowed(
          ^^^^^^^^^^^^^^^^^^^^
  File "f:\dev\3x\Lib\test\test_threading.py", line 1352, in _check_allowed
    _, _, err = assert_python_ok("-c", script)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "f:\dev\3x\Lib\test\support\script_helper.py", line 166, in assert_python_ok
    return _assert_python(True, *args, **env_vars)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "f:\dev\3x\Lib\test\support\script_helper.py", line 151, in _assert_python
    res.fail(cmd_line)
  File "f:\dev\3x\Lib\test\support\script_helper.py", line 76, in fail
    raise AssertionError("Process return code is %d\n"
AssertionError: Process return code is 1
command line: ['f:\\dev\\3x\\PCbuild\\amd64\\python_d.exe', '-X', 'faulthandler', '-c', '\nimport test.support\ntest.support.run_in_subinterp_with_config(\n    "\\nimport test.support\\nimport threading\\ndef func():\\n    print(\'this should not have run!\')\\nt = threading.Thread(target=func, daemon=True)\\n\\nt.start()\\n",\n    use_main_obmalloc=True,\n    allow_fork=True,\n    allow_exec=True,\n    allow_threads=True,\n    allow_daemon_threads=False,\n    check_multi_interp_extensions=False,\n    own_gil=False,\n)\n']

stdout:
---

---

stderr:
---
Traceback (most recent call last):
  File "<string>", line 3, in <module>
  File "f:\dev\3x\Lib\test\support\__init__.py", line 1859, in run_in_subinterp_with_config
    return _testinternalcapi.run_in_subinterp_with_config(code, **config)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: module '_testinternalcapi' has no attribute 'run_in_subinterp_with_config'
---

======================================================================
FAIL: test_daemon_threads_not_allowed (test.test_threading.SubinterpThreadingTests.test_daemon_threads_not_allowed) [via Thread.daemon setter]
----------------------------------------------------------------------
Traceback (most recent call last):
  File "f:\dev\3x\Lib\test\test_threading.py", line 1375, in test_daemon_threads_not_allowed
    err = self._check_allowed(
          ^^^^^^^^^^^^^^^^^^^^
  File "f:\dev\3x\Lib\test\test_threading.py", line 1352, in _check_allowed
    _, _, err = assert_python_ok("-c", script)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "f:\dev\3x\Lib\test\support\script_helper.py", line 166, in assert_python_ok
    return _assert_python(True, *args, **env_vars)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "f:\dev\3x\Lib\test\support\script_helper.py", line 151, in _assert_python
    res.fail(cmd_line)
  File "f:\dev\3x\Lib\test\support\script_helper.py", line 76, in fail
    raise AssertionError("Process return code is %d\n"
AssertionError: Process return code is 1
command line: ['f:\\dev\\3x\\PCbuild\\amd64\\python_d.exe', '-X', 'faulthandler', '-c', '\nimport test.support\ntest.support.run_in_subinterp_with_config(\n    "\\nimport test.support\\nimport threading\\ndef func():\\n    print(\'this should not have run!\')\\nt = threading.Thread(target=func, daemon=False)\\nt.daemon = True\\nt.start()\\n",\n    use_main_obmalloc=True,\n    allow_fork=True,\n    allow_exec=True,\n    allow_threads=True,\n    allow_daemon_threads=False,\n    check_multi_interp_extensions=False,\n    own_gil=False,\n)\n']

stdout:
---

---

stderr:
---
Traceback (most recent call last):
  File "<string>", line 3, in <module>
  File "f:\dev\3x\Lib\test\support\__init__.py", line 1859, in run_in_subinterp_with_config
    return _testinternalcapi.run_in_subinterp_with_config(code, **config)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: module '_testinternalcapi' has no attribute 'run_in_subinterp_with_config'
---

======================================================================
FAIL: test_threads_not_allowed (test.test_threading.SubinterpThreadingTests.test_threads_not_allowed)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "f:\dev\3x\Lib\test\test_threading.py", line 1357, in test_threads_not_allowed
    err = self._check_allowed(
          ^^^^^^^^^^^^^^^^^^^^
  File "f:\dev\3x\Lib\test\test_threading.py", line 1352, in _check_allowed
    _, _, err = assert_python_ok("-c", script)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "f:\dev\3x\Lib\test\support\script_helper.py", line 166, in assert_python_ok
    return _assert_python(True, *args, **env_vars)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "f:\dev\3x\Lib\test\support\script_helper.py", line 151, in _assert_python
    res.fail(cmd_line)
  File "f:\dev\3x\Lib\test\support\script_helper.py", line 76, in fail
    raise AssertionError("Process return code is %d\n"
AssertionError: Process return code is 1
command line: ['f:\\dev\\3x\\PCbuild\\amd64\\python_d.exe', '-X', 'faulthandler', '-c', '\nimport test.support\ntest.support.run_in_subinterp_with_config(\n    "\\nimport test.support\\nimport threading\\ndef func():\\n    print(\'this should not have run!\')\\nt = threading.Thread(target=func, daemon=False)\\n\\nt.start()\\n",\n    use_main_obmalloc=True,\n    allow_fork=True,\n    allow_exec=True,\n    allow_threads=False,\n    allow_daemon_threads=False,\n    check_multi_interp_extensions=False,\n    own_gil=False,\n)\n']

stdout:
---

---

stderr:
---
Traceback (most recent call last):
  File "<string>", line 3, in <module>
  File "f:\dev\3x\Lib\test\support\__init__.py", line 1859, in run_in_subinterp_with_config
    return _testinternalcapi.run_in_subinterp_with_config(code, **config)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: module '_testinternalcapi' has no attribute 'run_in_subinterp_with_config'
---

@vstinner
Copy link
Member

vstinner commented Sep 7, 2023

I'm able to reproduce the issue on Windows 11 with Python built in 32-bit mode. See Assertion failed: tstate_is_alive(tstate) in my logs.

It seems like the assertion error requires precise timing to be triggered.

Output (reformatted for readability):

vstinner@WIN C:\victor\python\main>python -m test -v test_threading --forever -m test_default_timeout -j20 --rerun 
Running Debug|Win32 interpreter...
== CPython 3.13.0a0 (heads/main:a0773b89df, Sep 7 2023, 07:59:33) [MSC v.1936 32 bit (Intel)]
== Windows-11-10.0.22621-SP0 little-endian
== Python build: debug
== cwd: C:\victor\python\main\build\test_python_1116æ
== CPU count: 2
== encodings: locale=cp1252, FS=utf-8


0:00:00 Run tests in parallel using 20 child processes
0:00:05 [  1/1] test_threading process crashed (Exit code 3)
test_default_timeout (test.test_threading.BarrierTests.test_default_timeout)
Test the barrier's default timeout ... ERROR

======================================================================
ERROR: test_default_timeout (test.test_threading.BarrierTests.test_default_timeout)
Test the barrier's default timeout
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\victor\python\main\Lib\test\lock_tests.py", line 1025, in test_default_timeout
    self.run_threads(f)
  File "C:\victor\python\main\Lib\test\lock_tests.py", line 854, in run_threads
    f()
  File "C:\victor\python\main\Lib\test\lock_tests.py", line 1020, in f
    i = barrier.wait()
        ^^^^^^^^^^^^^^
  File "C:\victor\python\main\Lib\threading.py", line 709, in wait
    self._wait(timeout)
  File "C:\victor\python\main\Lib\threading.py", line 747, in _wait
    raise BrokenBarrierError
threading.BrokenBarrierError

----------------------------------------------------------------------
Ran 1 test in 0.466s

FAILED (errors=1)
test test_threading failed

{"test_name": "test_threading", "state": "FAILED", "duration": 1.5271271999999954, "xml_data": null, "stats": {"tests_run": 1, "failures": 0, "skipped": 0}, "errors": [["test_default_timeout (test.test_threading.BarrierTests.test_default_timeout)", "Traceback (most recent call last):\n  File \"C:\\victor\\python\\main\\Lib\\test\\lock_tests.py\", line 1025, in test_default_timeout\n    self.run_threads(f)\n  File \"C:\\victor\\python\\main\\Lib\\test\\lock_tests.py\", line 854, in run_threads\n    f()\n  File \"C:\\victor\\python\\main\\Lib\\test\\lock_tests.py\", line 1020, in f\n    i = barrier.wait()\n        ^^^^^^^^^^^^^^\n  File \"C:\\victor\\python\\main\\Lib\\threading.py\", line 709, in wait\n    self._wait(timeout)\n  File \"C:\\victor\\python\\main\\Lib\\threading.py\", line 747, in _wait\n    raise BrokenBarrierError\nthreading.BrokenBarrierError\n"]], "failures": [], "__test_result__": "TestResult"}

Assertion failed: tstate_is_alive(tstate) && !tstate->_status.bound, file C:\victor\python\main\Python\pystate.c, line 245 
Fatal Python error: Aborted

Thread 0x00000854 (most recent call first):
  <no Python frame>
(...)

@vstinner
Copy link
Member

vstinner commented Sep 8, 2023

The test_default_timeout() failured with the Warning -- Unraisable exception log can be reproduced on Linux by stressing the machine with many processes, way more than the number of CPUs.

Example of my laptop with 12 thread (6 CPU cores):

$ ./python -m test -v test_threading --forever -m test_default_timeout -j200 --rerun 
(...)
0:00:25 load avg: 63.13 [101] test_threading passed
test_default_timeout (test.test_threading.BarrierTests.test_default_timeout)
Test the barrier's default timeout ... ok

----------------------------------------------------------------------
Ran 1 test in 2.763s

OK
0:00:25 load avg: 63.13 [102/1] test_threading failed (1 error)
test_default_timeout (test.test_threading.BarrierTests.test_default_timeout)
Test the barrier's default timeout ... ERROR

======================================================================
ERROR: test_default_timeout (test.test_threading.BarrierTests.test_default_timeout)
Test the barrier's default timeout
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/vstinner/python/main/Lib/test/lock_tests.py", line 1025, in test_default_timeout
    self.run_threads(f)
  File "/home/vstinner/python/main/Lib/test/lock_tests.py", line 854, in run_threads
    f()
  File "/home/vstinner/python/main/Lib/test/lock_tests.py", line 1020, in f
    i = barrier.wait()
        ^^^^^^^^^^^^^^
  File "/home/vstinner/python/main/Lib/threading.py", line 709, in wait
    self._wait(timeout)
  File "/home/vstinner/python/main/Lib/threading.py", line 747, in _wait
    raise BrokenBarrierError
threading.BrokenBarrierError

----------------------------------------------------------------------
Ran 1 test in 0.314s

FAILED (errors=1)
Warning -- Unraisable exception
Exception ignored in thread started by: <function Bunch.__init__.<locals>.task at 0x7fb63cbb5f10>
Traceback (most recent call last):
  File "/home/vstinner/python/main/Lib/test/lock_tests.py", line 48, in task
    f()
  File "/home/vstinner/python/main/Lib/test/lock_tests.py", line 1020, in f
    i = barrier.wait()
        ^^^^^^^^^^^^^^
  File "/home/vstinner/python/main/Lib/threading.py", line 709, in wait
    self._wait(timeout)
  File "/home/vstinner/python/main/Lib/threading.py", line 749, in _wait
(...)

vstinner added a commit to vstinner/cpython that referenced this issue Sep 8, 2023
If a thread is created during Python finalization, the newly spawned
thread will exit immediately.

thread_run() calls PyEval_AcquireThread() which checks if the thread
must exit. The problem is that tstate is dereferenced earlier in
_PyThreadState_Bind() which lead to a crash most of the time.
vstinner added a commit to vstinner/cpython that referenced this issue Sep 8, 2023
Fix _thread.start_new_thread() race condition. If a thread is created
during Python finalization, the newly spawned thread now exits
immediately instead of trying to access freed memory and lead to a
crash.

thread_run() calls PyEval_AcquireThread() which checks if the thread
must exit. The problem was that tstate was dereferenced earlier in
_PyThreadState_Bind() which leads to a crash most of the time.
vstinner added a commit to vstinner/cpython that referenced this issue Sep 8, 2023
Fix _thread.start_new_thread() race condition. If a thread is created
during Python finalization, the newly spawned thread now exits
immediately instead of trying to access freed memory and lead to a
crash.

thread_run() calls PyEval_AcquireThread() which checks if the thread
must exit. The problem was that tstate was dereferenced earlier in
_PyThreadState_Bind() which leads to a crash most of the time.
vstinner added a commit to vstinner/cpython that referenced this issue Sep 8, 2023
Fix _thread.start_new_thread() race condition. If a thread is created
during Python finalization, the newly spawned thread now exits
immediately instead of trying to access freed memory and lead to a
crash.

thread_run() calls PyEval_AcquireThread() which checks if the thread
must exit. The problem was that tstate was dereferenced earlier in
_PyThreadState_Bind() which leads to a crash most of the time.
vstinner added a commit to vstinner/cpython that referenced this issue Sep 8, 2023
Fix _thread.start_new_thread() race condition. If a thread is created
during Python finalization, the newly spawned thread now exits
immediately instead of trying to access freed memory and lead to a
crash.

thread_run() calls PyEval_AcquireThread() which checks if the thread
must exit. The problem was that tstate was dereferenced earlier in
_PyThreadState_Bind() which leads to a crash most of the time.

Move _PyThreadState_CheckConsistency() from thread_run() to
_PyThreadState_Bind().
@vstinner
Copy link
Member

Oh, the GitHub Action Windows x64 job is also affected by this bug.

Log reformatted to make them more readable (I removed NUL bytes, add empty lines):

0:17:13 Re-running 2 failed tests in verbose mode in subprocesses
0:17:13 Run tests in parallel using 4 child processes (timeout: 20 min, worker timeout: 25 min)
0:17:14 [1/2/1] test_threading process crashed (Exit code 3)
Re-running test_threading in verbose mode (matching: test_default_timeout)
Kill <WorkerThread #2 running test=test.test_concurrent_futures.test_as_completed pid=1364 time=1.1 sec>
test_default_timeout (test.test_threading.BarrierTests.test_default_timeout)
Test the barrier's default timeout ... ERROR

======================================================================
ERROR: test_default_timeout (test.test_threading.BarrierTests.test_default_timeout)
Test the barrier's default timeout
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\a\cpython\cpython\Lib\test\lock_tests.py", line 1025, in test_default_timeout
    self.run_threads(f)
  File "D:\a\cpython\cpython\Lib\test\lock_tests.py", line 854, in run_threads
    f()
  File "D:\a\cpython\cpython\Lib\test\lock_tests.py", line 1020, in f
    i = barrier.wait()
        ^^^^^^^^^^^^^^
  File "D:\a\cpython\cpython\Lib\threading.py", line 709, in wait
    self._wait(timeout)
  File "D:\a\cpython\cpython\Lib\threading.py", line 747, in _wait
    raise BrokenBarrierError
threading.BrokenBarrierError

----------------------------------------------------------------------
Ran 1 test in 0.366s

FAILED (errors=1)
test test_threading failed

{"test_name": "test_threading", "state": "FAILED", "duration": 0.6854470000000674, "xml_data": null, "stats": {"tests_run": 1, "failures": 0, "skipped": 0}, "errors": [["test_default_timeout (test.test_threading.BarrierTests.test_default_timeout)", "Traceback (most recent call last):\n  File \"D:\\a\\cpython\\cpython\\Lib\\test\\lock_tests.py\", line 1025, in test_default_timeout\n    self.run_threads(f)\n  File \"D:\\a\\cpython\\cpython\\Lib\\test\\lock_tests.py\", line 854, in run_threads\n    f()\n  File \"D:\\a\\cpython\\cpython\\Lib\\test\\lock_tests.py\", line 1020, in f\n    i = barrier.wait()\n        ^^^^^^^^^^^^^^\n  File \"D:\\a\\cpython\\cpython\\Lib\\threading.py\", line 709, in wait\n    self._wait(timeout)\n  File \"D:\\a\\cpython\\cpython\\Lib\\threading.py\", line 747, in _wait\n    raise BrokenBarrierError\nthreading.BrokenBarrierError\n"]], "failures": [], "__test_result__": "TestResult"}

Assertion failed: !_PyMem_IsPtrFreed(tstate->interp), file D:\a\cpython\cpython\Python\pystate.c, line 2905
Fatal Python error: Aborted

Thread 0x00001500 (most recent call first):
  <no Python frame>

Extension modules: _wmi (total: 1)
Assertion failed: !_PyMem_IsPtrFreed(tstate->interp), file D:\a\cpython\cpython\Python\pystate.c, line 2905

1 test failed again:
    test_threading

I had to download logs manually and remove null bytes, the two assertion errors are written in UTF-16 in a 8-bit console, and so the log is full of NUL bytes. I removed these NUL bytes.


By the way, the regrtest final report is a little bit surprising.

First, 2 tests failed:

  • test.test_concurrent_futures.test_as_completed
  • test_threading

Then, these 2 tests were run again. But test_threading crashed, regrtest failed to get the JSON from the output which is polluted by assertion errors. In that case, regrtest kills other worker processes, and stops immediately.

The surprised part is that the final report lists test.test_concurrent_futures.test_as_completed in omitted tests:

== Tests result: FAILURE then FAILURE ==

1 test omitted:
    test.test_concurrent_futures.test_as_completed

422 tests OK.

10 slowest tests:
- test_importlib: 3 min 1 sec
- test_regrtest: 2 min 57 sec
- test.test_multiprocessing_spawn.test_processes: 2 min 47 sec
- test.test_multiprocessing_spawn.test_misc: 1 min 49 sec
- test.test_multiprocessing_spawn.test_manager: 1 min 37 sec
- test_socket: 1 min 30 sec
- test_mmap: 1 min 30 sec
- test_tarfile: 1 min 24 sec
- test_zipfile: 1 min 23 sec
- test_compileall: 1 min 19 sec

1 test failed:
    test_threading

37 tests skipped:
    test.test_asyncio.test_unix_events
    test.test_multiprocessing_fork.test_manager
    test.test_multiprocessing_fork.test_misc
    test.test_multiprocessing_fork.test_processes
    test.test_multiprocessing_fork.test_threads
    test.test_multiprocessing_forkserver.test_manager
    test.test_multiprocessing_forkserver.test_misc
    test.test_multiprocessing_forkserver.test_processes
    test.test_multiprocessing_forkserver.test_threads test_curses
    test_dbm_gnu test_dbm_ndbm test_devpoll test_epoll test_fcntl
    test_fork1 test_gdb test_grp test_ioctl test_kqueue test_msvcrt
    test_openpty test_perf_profiler test_perfmaps test_poll test_posix
    test_pty test_pwd test_readline test_resource test_syslog
    test_threadsignals test_wait3 test_wait4 test_xxlimited
    test_xxtestfuzz test_zoneinfo

2 tests skipped (resource denied):
    test_peg_generator test_zipfile64

1 re-run test:
    test_threading

Total duration: 17 min 14 sec
Total tests: run=40,542 failures=1 skipped=2,303
Total test files: run=461/463 failed=1 skipped=37 resource_denied=2 rerun=1
Result: FAILURE then FAILURE

It's not obvious that something was badly wrong in the re-run step.

vstinner added a commit that referenced this issue Sep 11, 2023
Fix _thread.start_new_thread() race condition. If a thread is created
during Python finalization, the newly spawned thread now exits
immediately instead of trying to access freed memory and lead to a
crash.

thread_run() calls PyEval_AcquireThread() which checks if the thread
must exit. The problem was that tstate was dereferenced earlier in
_PyThreadState_Bind() which leads to a crash most of the time.

Move _PyThreadState_CheckConsistency() from thread_run() to
_PyThreadState_Bind().
vstinner added a commit to vstinner/cpython that referenced this issue Sep 11, 2023
…n#109135)

Fix _thread.start_new_thread() race condition. If a thread is created
during Python finalization, the newly spawned thread now exits
immediately instead of trying to access freed memory and lead to a
crash.

thread_run() calls PyEval_AcquireThread() which checks if the thread
must exit. The problem was that tstate was dereferenced earlier in
_PyThreadState_Bind() which leads to a crash most of the time.

Move _PyThreadState_CheckConsistency() from thread_run() to
_PyThreadState_Bind().

(cherry picked from commit 517cd82)
vstinner added a commit that referenced this issue Sep 11, 2023
) (#109272)

gh-108987: Fix _thread.start_new_thread() race condition (#109135)

Fix _thread.start_new_thread() race condition. If a thread is created
during Python finalization, the newly spawned thread now exits
immediately instead of trying to access freed memory and lead to a
crash.

thread_run() calls PyEval_AcquireThread() which checks if the thread
must exit. The problem was that tstate was dereferenced earlier in
_PyThreadState_Bind() which leads to a crash most of the time.

Move _PyThreadState_CheckConsistency() from thread_run() to
_PyThreadState_Bind().

(cherry picked from commit 517cd82)
@vstinner
Copy link
Member

I modified regrtest to fail with exit code 5 when a test fails and then pass when re-run: new --fail-reun option (it's disabled by default). I modified buildbot configuration to use --fail-rerun option on the Python main branch.

For example, a recent Windows10 job is now marked as WARNINGS (orange) because test_threading.test_default_timeout() failed and then passed when re-run in verbose mode in a fresh process.

@vstinner
Copy link
Member

This issue is about 3 things:

I prefer to stick this issue to the crash.

I created a new issue #109401 about the last points, random test failure and "Unraisable exception".


The crash is now fixed in 3.11 and main branhches. I'm waiting until #109133 backport is merged into 3.12 to be able to backport #109135 to 3.12. @Yhg1s decided that this fix has to wait for Python 3.12.1, so once the 3.12 branch will be reopened after Python 3.12.0 final release ("3.12.0 final: Monday, 2023-10-02" says PEP 693).

For now, I prefer to close this issue.

vstinner added a commit to vstinner/cpython that referenced this issue Oct 4, 2023
…n#109135)

Fix _thread.start_new_thread() race condition. If a thread is created
during Python finalization, the newly spawned thread now exits
immediately instead of trying to access freed memory and lead to a
crash.

thread_run() calls PyEval_AcquireThread() which checks if the thread
must exit. The problem was that tstate was dereferenced earlier in
_PyThreadState_Bind() which leads to a crash most of the time.

Move _PyThreadState_CheckConsistency() from thread_run() to
_PyThreadState_Bind().

(cherry picked from commit 517cd82)
vstinner added a commit that referenced this issue Oct 4, 2023
) (#110342)

* gh-108987: Fix _thread.start_new_thread() race condition (#109135)

Fix _thread.start_new_thread() race condition. If a thread is created
during Python finalization, the newly spawned thread now exits
immediately instead of trying to access freed memory and lead to a
crash.

thread_run() calls PyEval_AcquireThread() which checks if the thread
must exit. The problem was that tstate was dereferenced earlier in
_PyThreadState_Bind() which leads to a crash most of the time.

Move _PyThreadState_CheckConsistency() from thread_run() to
_PyThreadState_Bind().

(cherry picked from commit 517cd82)

* gh-109795: `_thread.start_new_thread`: allocate thread bootstate using raw memory allocator (#109808)

(cherry picked from commit 1b8f236)

---------

Co-authored-by: Radislav Chugunov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OS-windows type-crash A hard crash of the interpreter, possibly with a core dump
Projects
Development

No branches or pull requests

3 participants