Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows release machines are all offline #3674

Closed
targos opened this issue Apr 7, 2024 · 9 comments
Closed

Windows release machines are all offline #3674

targos opened this issue Apr 7, 2024 · 9 comments

Comments

@targos
Copy link
Member

targos commented Apr 7, 2024

https://ci-release.nodejs.org/computer/

image

@StefanStojanovic

@StefanStojanovic
Copy link
Contributor

Hey @targos thanks for letting me know. The exception I see on the machines is the following:

INFO: Could not locate server among [https://ci-release.nodejs.org/]; waiting 10 seconds before retry
java.io.IOException: https://ci-release.nodejs.org/ provided port:11111 is not reachable on host ci-release.nodejs.org
        at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:304)
        at hudson.remoting.Engine.innerRun(Engine.java:809)
        at hudson.remoting.Engine.run(Engine.java:563)

As I recall, when new machines were added to the release CI, firewall rules were added for them. Is it possible that those rules were removed/edited recently? From what I see this started on Friday/Saturday.

P.S. I've changed the actual port with 11111 to keep the real one a secret.

@targos
Copy link
Member Author

targos commented Apr 8, 2024

According to the command history on ci-release, @richardlau recently touched the iptables rules.

@richardlau
Copy link
Member

According to the command history on ci-release, @richardlau recently touched the iptables rules.

That would have been over a week ago, before the collab summit (for #3663).

@richardlau
Copy link
Member

According to the command history on ci-release, @richardlau recently touched the iptables rules.

That would have been over a week ago, before the collab summit (for #3663).

So it does look like the IP addresses from the Windows rackspace machines do not match the inventory in richard-20240326, which is the backup I edited for #3663. This was taken from /etc/iptables/rules.v4 on ci-release. I'm guessing this only manifested over the weekend because the machines self-updated/rebooted (the edit was made over a week ago)?

I'll update the firewall with the IP addresses from the inventory/secrets.

@StefanStojanovic
Copy link
Contributor

Thanks for the update @richardlau, I was just about to say that running ping ci-release.nodejs.org works, so the port is the issue, thus the firewall. After you fix it, should we let all of the started builds and update jobs finish (everything will be back to normal by tomorrow), or would you prefer to cancel queued jobs?

@richardlau
Copy link
Member

I've updated, and made sure the changes are reflected in /etc/iptables/rules.v4. It looks like the machines are online now in Jenkins and are picking up jobs -- let's let them run and keep an eye out for any issues.

@StefanStojanovic
Copy link
Contributor

The queue is emptied. Everything seems to be back to normal. I'll close this issue in 1-2 days if no incident occurs.

@richardlau
Copy link
Member

richardlau commented Apr 9, 2024

FWIW I think this is a new/separate problem, but today's nightly build failed on vs2022-arm64: https://ci-release.nodejs.org/job/iojs+release/10098/nodes=vs2022-arm64/consoleFull with

07:02:12 c:\ws\deps\simdutf\simdutf.cpp(16719,7): error C2664: '__n128x4 neon_ld4m_q8(const char *)': cannot convert argument 1 from 'const uint8_t [64]' to 'const char *' [c:\ws\deps\simdutf\simdutf.vcxproj]
07:02:12 c:\ws\deps\simdutf\simdutf.cpp(16719,7): message : Types pointed to are unrelated; conversion requires reinterpret_cast, C-style cast or parenthesized function-style cast [c:\ws\deps\simdutf\simdutf.vcxproj]
07:02:12 C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.37.32822\include\arm64_neon.h(6146,10): message : see declaration of 'neon_ld4m_q8' [c:\ws\deps\simdutf\simdutf.vcxproj]
07:02:12 c:\ws\deps\simdutf\simdutf.cpp(16719,7): message : while trying to match the argument list '(const uint8_t [64])' [c:\ws\deps\simdutf\simdutf.vcxproj]
07:02:12 c:\ws\deps\simdutf\simdutf.cpp(16719,73): fatal  error C1903: unable to recover from previous error(s); stopping compilation [c:\ws\deps\simdutf\simdutf.vcxproj]

I don't think this occurred on the test CI, although https://ci.nodejs.org/job/node-compile-windows/55318/nodes=win-vs2022-arm64/consoleFull failed with

08:58:52 C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\include\tuple(47,90): fatal  error C1060: compiler is out of heap space [C:\workspace\node-compile-windows\node\tools\v8_gypfiles\v8_initializers.vcxproj]

simdutf8 was updated in nodejs/node#52381 but the test CI runs for that passed.

@targos
Copy link
Member Author

targos commented Apr 11, 2024

I opened simdutf/simdutf#407 for the simdutf error. I think we can close this issue as it's resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants