-
-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Worker exits but jest process never finishes #13183
Comments
@phawxby thoughts? |
Hi everyone — @vilmosioo is my teammate and I wanted to give a bit more info. We spent a few days investigating this before filing the issue. I do believe the workers are being OOMKilled, but don't have logging to verify since this is running in a CircleCI docker container. I hope the report contains enough information to be helpful still. Please let us know if there's any extra details that would be useful. We're not actively seeing this issue day to day, but thought reporting it would help the Jest team and other users. Thanks for making Jest. |
Exit codes and exit signals were probably the most problematic bit of the fixes that I put in recently because they're extremely inconsistent between platforms. However I thought I had covered the OOM scenario, and I do have a test which forcibly induces an OOM crash here. My questions at this point would be.
|
Thanks for taking a look @phawxby. I think I should have been more specific about the kind of OOM we think we're seeing.
This does happen with Jest I think your fix is strictly an improvement. Thank you. 🙂 We'll work on migrating to Jest
The full repro is platform-specific due to OOMKiller being Linux-specific. I think we can simulate OOMKiller by calling
I'm sure there's a good reason, but I was curious why I see the other branches in this conditional either throw an error or retry. |
@gluxon you make a good point. That wasn't actually code I originally wrote, I just worked around it so didn't really dig into why it's doing what it's doing. But you're right:
We probably shouldn't pay any attention to what the exit code or signal actually is unless there's a known problem, otherwise we just work based on what the intended state is |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 30 days. |
Ah that clarification is helpful. Thank you for your work to make this more correct on internal OOMs.
Agree with this. I'm going to try creating a test case that mimics what OOMKiller does and see how the Jest maintainers feel about any changes to make these forms of OOMs more clear. |
Created a minimal repro. This test hangs forever and never hits the 3s timeout. // src/killed.test.ts
export async function wait(ms: number) {
return new Promise((resolve) => setTimeout(resolve, ms));
}
test("jest-worker externally killed", async () => {
await wait(2_000);
}, 3_000);
setTimeout(() => {
// Self-kill to make repro easier.
process.kill(process.pid);
}, 1_000); https://github.com/gluxon/test-jest-worker-killed-repro |
Starting a fix in #13566. Any reviews or feedback welcome! |
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Version
28.1.3
Steps to reproduce
Related and recent fix https://github.com/facebook/jest/pull/13054/files#diff-e99351d361fdd6f55d39c7c41af94576baccc2b9dce73a5bfd0b516c6ab648e9
However the workers may crash with other signals and those scenarios are not covered. In our case, after some debugging, the signal is null. For some reason these workers are crashing in jest-runtime at the compileFunction.call line, and causes a null exit code, which gets ignored. jest-runner waits on a thread pool that'll never fulfil the submitted job.
The signal appears to be SIGKILL instead of SIBABRT , and the exitCode appears to be null. Please see screenshots of the debug process.
The above outputs
ChildProcessWorker exit: exitCode = null, pid = 1378, signal = SIGKILL
We apologize for the lack of a minimal reproduction, but hope the thorough investigation given will substitute.
Expected behavior
Jest exits when one of the workers crashes for whatever reason.
Actual behavior
Jest hangs when workers are unexpectedly SIGKILL'ed.
Additional context
This does not happen with typescript v4.6.4. This only happens after we upgraded to v4.7.4. This may not be relevant, we are thinking this is an OOM.
Environment
The text was updated successfully, but these errors were encountered: