Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test patches for hang #81

Closed

Conversation

jeremyd2019
Copy link

@jeremyd2019 jeremyd2019 commented Jan 18, 2025

@jeremyd2019 jeremyd2019 marked this pull request as draft January 18, 2025 21:27
@jeremyd2019 jeremyd2019 force-pushed the gfw-takashi-patches branch 3 times, most recently from 2b984dd to 88d32fa Compare January 21, 2025 06:00
tyan0 added 3 commits January 21, 2025 10:21
…ent"

This reverts commit a22a0ad7c4f0 to apply a new patch for the same
purpose.

Signed-off-by: Takashi Yano <[email protected]>
To allow cygwait() to be called in the signal handler, a locally
created timer is used instead of _cygtls::locals.cw_timer if it is
in use.

Co-Authored-By: Corinna Vinschen <[email protected]>
Signed-off-by: Takashi Yano <[email protected]>
(cherry picked from commit ea1914000adcbed5c16fc0ba2676173b2f6016c4)
The commit a22a0ad7c4f0 was not entirely correct. Even with the patch,
some hangs still occur. This patch overrides the previous commit along
with the patch that makes cygwait() reentrant, to fix these hangs.

Addresses: https://cygwin.com/pipermail/cygwin/2024-December/256954.html
Fixes: d243e51ef1d3 ("Cygwin: signal: Fix deadlock between main thread and sig thread")
Reported-by: Daisuke Fujimura <[email protected]>
Reviewed-by: Corinna Vinschen <[email protected]>
Signed-off-by: Takashi Yano <[email protected]>
(cherry picked from commit 83afe3e238cd12fb7d4799ba6b3c77e9e3618d91)
@dscho dscho force-pushed the gfw-takashi-patches branch from c164fd2 to e979cc2 Compare January 21, 2025 20:01
@dscho
Copy link
Member

dscho commented Jan 21, 2025

@jeremyd2019 my apologies; I should not have abused your PR for the test workflow run. It is now running here instead.

@jeremyd2019
Copy link
Author

That's what it's here for 😉

I'm confused though, it looks like you used https://github.com/git-for-windows/msys2-runtime/actions/runs/12502589490, not https://github.com/git-for-windows/msys2-runtime/actions/runs/12894770137 or the just-finished https://github.com/git-for-windows/msys2-runtime/actions/runs/12894852152, is that intended to reproduce the hangs before testing if the patches in this PR fix them?

I'm thinking that they're going to release a 3.5.6 pretty soon once the hangs are fixed, so may not be worth backporting them to 3.5.5 right away

@jeremyd2019
Copy link
Author

jeremyd2019 commented Jan 21, 2025

I did a run with these patches (e979cc2, https://github.com/git-for-windows/msys2-runtime/actions/runs/12894852152/artifacts/2463938990), and there were no hangs, but there was one failure, is that "expected" or possibly a regression from these patches? https://github.com/jeremyd2019/gfw-msys2-runtime/actions/runs/12895570202

@dscho
Copy link
Member

dscho commented Jan 21, 2025

I'm confused though, it looks like you used https://github.com/git-for-windows/msys2-runtime/actions/runs/12502589490, not https://github.com/git-for-windows/msys2-runtime/actions/runs/12894770137 or the just-finished https://github.com/git-for-windows/msys2-runtime/actions/runs/12894852152, is that intended to reproduce the hangs before testing if the patches in this PR fix them?

D'oh. I did not intend this, this run was from an earlier test I ran, and the reason is that I had failed to realize that I still had these uncommitted changes (which I had intended to amend the commit with, before pushing):

diff --git a/.github/workflows/test-msys2-runtime.yml b/.github/workflows/test-msys2-runtime.yml
index 93136ec258..0e585214b4 100644
--- a/.github/workflows/test-msys2-runtime.yml
+++ b/.github/workflows/test-msys2-runtime.yml
@@ -12,7 +12,7 @@ on:
 
 env:
   LC_CTYPE: C.UTF-8
-  MSYS2_RUNTIME_ARTIFACT_URL: ${{ inputs.msys2-runtime-artifacts-url || 'https://github.com/git-for-windows/msys2-runtime/actions/runs/12502589490/artifacts/2363069011' }}
+  MSYS2_RUNTIME_ARTIFACT_URL: ${{ inputs.msys2-runtime-artifacts-url || 'https://github.com/git-for-windows/msys2-runtime/actions/runs/12893351109' }}
   G4W_SDK_REPO: git-for-windows/git-sdk-64
 
 jobs:

At least you got a chance to run the workflow yourself, and now you have it for all similar future incidents.

I did a run with these patches, and there were no hangs, but there was one failure, is that "expected" or possibly a regression from these patches? https://github.com/jeremyd2019/gfw-msys2-runtime/actions/runs/12895570202

Yeah, this failure is unfortunately a flake. Somewhere deep in the reftable code, when doing massively concurrent writes to essentially the same file, it reports an error with the most useless error message "reftable: transaction prepare: I/O error"... Re-running is almost certainly going to "fix" it.

@jeremyd2019
Copy link
Author

yep, re-run succeeded

@dscho
Copy link
Member

dscho commented Jan 21, 2025

yep, re-run succeeded

I also fixed the incorrect URL in my diff and it succeeded, too. So now those hangs are fixed, eh? Do I dare look at the patches or am I too scared that my hope for clear and well-documented changes be shattered?

It seems that current _cygtls::handle_SIGCONT() code sometimes falls
into a deadlock due to frequent TLS lock/unlock operation in the
yield() loop. With this patch, the yield() in the wait loop is placed
outside the TLS lock to avoid frequent TLS lock/unlock.

Fixes: 9ae51bcc51a7 ("Cygwin: signal: Fix another deadlock between main and sig thread")
Reviewed-by: Corinna Vinschen <[email protected]>
Signed-off-by: Takashi Yano <[email protected]>
@jeremyd2019
Copy link
Author

Successful test suite run https://github.com/jeremyd2019/gfw-msys2-runtime/actions/runs/12914563954

Closing this since all the patches are upstream now, probably anything further should start with msys2/msys2-runtime#253, or just wait for 3.5.6

@jeremyd2019 jeremyd2019 deleted the gfw-takashi-patches branch January 22, 2025 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants