Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue: Networking Issues #3190

Open
adamfarley opened this issue Sep 19, 2023 · 29 comments
Open

Tracking issue: Networking Issues #3190

adamfarley opened this issue Sep 19, 2023 · 29 comments

Comments

@adamfarley
Copy link
Contributor

Summary

This issue is for storing details of networking issues seen during triage or otherwise.

Details

Each entry should include the date, host, error message, and a URL.

The problems listed here can have issues elsewhere, but are primarily for unpredictable, temporary issues.

Examples:

  • Lost connection between Jenkins and host.
  • Lost connection between host and github.com.
  • Corrupted downloads from a server somewhere (such as Maven) to the host.
@sxa sxa pinned this issue Sep 20, 2023
@sxa sxa unpinned this issue Sep 20, 2023
@sxa sxa pinned this issue Oct 12, 2023
@sxa
Copy link
Member

sxa commented Oct 16, 2023

GitHub access issue on [mac installer creation](job https://ci.adoptium.net/job/build-scripts/job/release/job/create_installer_mac/10953/console) (No machine selected at this point)

ERROR: Error cloning remote repo 'origin'
hudson.plugins.git.GitException: Command "git fetch --tags --force --progress -- https://github.com/adoptium/installer.git +refs/heads/*:refs/remotes/origin/*" returned status code 128:
stdout: 
stderr: fatal: unable to access 'https://github.com/adoptium/installer.git/': Failed to connect to github.com port 443: Operation timed out

@adamfarley
Copy link
Contributor Author

https://ci.adoptium.net/job/Test_openjdk17_hs_sanity.openjdk_aarch64_linux/373/consoleFull - test-docker-centos8-armv8-1
https://ci.adoptium.net/job/Test_openjdk17_hs_extended.openjdk_aarch64_linux_testList_1/42/console - test-docker-ubuntu1804-armv8l-4

Exception: org.jenkinsci.plugins.workflow.support.steps.AgentOfflineException: Unable to create live FilePath for test-docker-ubuntu1804-armv8l-4; test-docker-ubuntu1804-armv8l-4 was marked offline: Connection was broken

The failures happened about an hour apart, but the wording is about the same.

@adamfarley
Copy link
Contributor Author

adamfarley commented Dec 5, 2023

Two tests failed because they couldn't download renaissance.jar for performance runs, as part of (or just after) the liberty setup.

URL seems fine on my machine, so assuming it's a temporary upstream networking/server issue until future failures prove otherwise.

https://ci.adoptium.net/job/Test_openjdk11_hs_sanity.perf_x86-64_linux/884/console
https://ci.adoptium.net/job/Test_openjdk11_hs_extended.perf_x86-64_linux/161/console

Both ran on test-equinix_esxi-ubuntu2204-x64-1

@adamfarley
Copy link
Contributor Author

https://ci.adoptium.net/job/Test_openjdk21_hs_extended.openjdk_x86-64_mac_testList_1/31/console

Cannot contact test-orka-macos14-x64-96vzx: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel

@smlambert
Copy link
Contributor

#3190 (comment) - should be resolved by adoptium/aqa-tests#4903 once merged.

@sxa
Copy link
Member

sxa commented Dec 5, 2023

Two tests failed because they couldn't download renaissance.jar for performance runs, as part of (or just after) the liberty setup.
URL seems fine on my machine, so assuming it's a temporary upstream networking/server issue until future failures prove otherwise.

Those are both disk space issues, not networking ones.

18:03:21  tee: /home/jenkins/workspace/Test_openjdk11_hs_sanity.perf_x86-64_linux/aqa-tests/TKG/../TKG/output_compilation/compilation.log: No space left on device
18:07:00      [retry] Attempt [2]: error occurred; retrying...tee: /home/jenkins/workspace/Test_openjdk11_hs_extended.perf_x86-64_linux/aqa-tests/TKG/../TKG/output_compilation/compilation.log: No space left on device

@adamfarley
Copy link
Contributor Author

As noted here, JDK8u is missing a large number of published binaries from the temurin8-binaries repo.

I think that is because the jdk8 build pipeline on the 11th timed out while uploading the binaries for some reason. Not sure if this was because of a hang or just slow uploads, as the upload job seems to lack regular time stamps.

Currently my plan is to ignore this unless it becomes a pattern.

@adamfarley
Copy link
Contributor Author

Date: 7 Jun 2024
Host: build-siteox-solaris10u11-sparcv9-1
Error:

11:43:15  Exception: org.jenkinsci.plugins.workflow.support.steps.AgentOfflineException: Unable to create live FilePath for build-siteox-solaris10u11-sparcv9-1; build-siteox-solaris10u11-sparcv9-1 was marked offline: Connection was broken

URL: https://ci.adoptium.net/job/Test_openjdk8_hs_sanity.functional_sparcv9_solaris_testList_0/2/console

@adamfarley
Copy link
Contributor Author

Four issues that look the same:

Date: 26 Jun 2024
Hosts: test-docker-ubi9-armv8l-1, test-docker-ubuntu2204-armv8-2, test-docker-ubuntu2310-armv8l-1, test-docker-ubuntu2204-armv8l-2
Error:

01:57:59  STF 00:57:58.849 - +------ Step 3 - Wait for processes to complete
01:57:59  STF 00:57:58.849 - | Wait for processes to meet expectations
01:57:59  STF 00:57:58.849 - |   Processes: [LT1, CL1]
01:57:59  STF 00:57:58.849 - |
01:57:59  STF 00:57:58.849 - Monitoring processes: CL1 LT1
01:58:02  CL1 j> 2024/06/27 00:58:00.679 ServerURL=service:jmx:rmi:///jndi/rmi://localhost:1234/jmxrmi
01:58:02  CL1 j> 2024/06/27 00:58:01.673 Attempting to connect
01:58:03  CL1 j> 2024/06/27 00:58:03.213 Monitored VM not ready at Jun 27, 2024, 12:58:03 AM (attempt 1, elapsed 1269ms).
01:58:03  CL1 j> 2024/06/27 00:58:03.215 Waiting 5 secs and trying again...
01:58:09  CL1 j> 2024/06/27 00:58:08.215 Attempting to connect
01:58:10  CL1 j> 2024/06/27 00:58:09.412 Connection established!
01:58:11  CL1 j> 2024/06/27 00:58:11.088 Starting to write data
02:00:18  Cannot contact test-docker-ubi9-armv8l-1: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@1ee8196a:test-docker-ubi9-armv8l-1": Remote call on test-docker-ubi9-armv8l-1 failed. The channel is closing down or has closed down

URLs:

@adamfarley
Copy link
Contributor Author

Date: 26 Jun 2024
Hosts: test-docker-ubuntu2004-armv7l-5
Error:

02:00:16  Cannot contact test-docker-ubuntu2004-armv7l-5: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@50e98578:test-docker-ubuntu2004-armv7l-5": Remote call on test-docker-ubuntu2004-armv7l-5 failed. The channel is closing down or has closed down

URL: https://ci.adoptium.net/job/Test_openjdk11_hs_sanity.openjdk_arm_linux_testList_0/4/console

@adamfarley
Copy link
Contributor Author

Date: 26 Jun 2024
Host: test-orka-macos14-x64-z7lvx
Error:

21:40:32  Exception: org.jenkinsci.plugins.workflow.support.steps.AgentOfflineException: Unable to create live FilePath for test-orka-macos14-x64-z7lvx; test-orka-macos14-x64-z7lvx was marked offline: Connection was broken

URL: https://ci.adoptium.net/job/Test_openjdk21_hs_extended.openjdk_x86-64_mac_testList_1/13/console

@adamfarley
Copy link
Contributor Author

Date: 26 Jun 2024
Hosts: test-docker-ubuntu2004-armv8l-1, test-docker-sles15-armv8l-1, test-docker-ubuntu2404-armv8-1, test-docker-debian12-armv8l-1, test-docker-fedora39-armv8l-1.
Error:

02:00:24  Cannot contact test-docker-ubuntu2004-armv8l-1: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@601a06d8:test-docker-ubuntu2004-armv8l-1": Remote call on test-docker-ubuntu2004-armv8l-1 failed. The channel is closing down or has closed down

URLs:

@adamfarley
Copy link
Contributor Author

Date 25 Jun 2024
Host: test-docker-debian12-armv7l-1
Error:

02:00:25  Cannot contact test-docker-debian12-armv7l-1: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@353dc87:test-docker-debian12-armv7l-1": Remote call on test-docker-debian12-armv7l-1 failed. The channel is closing down or has closed down

URL: https://ci.adoptium.net/job/Test_openjdk17_hs_extended.openjdk_arm_linux_testList_0/14/console

@sxa sxa unpinned this issue Aug 1, 2024
@adamfarley
Copy link
Contributor Author

adamfarley commented Aug 6, 2024

Date: 2024/08/02
Host: test-orka-macos14-x64-4ncp7
Error: 04:37:43 Cannot contact test-orka-macos14-x64-4ncp7: java.lang.InterruptedException
URL: https://ci.adoptium.net/job/Test_openjdk23_hs_extended.openjdk_x86-64_mac/18/

Similar failures:
Date: 2024/08/01
Host: test-orka-macos14-x64-97f7w
22:18:18 Cannot contact test-orka-macos14-x64-97f7w: java.lang.InterruptedException
URL: https://ci.adoptium.net/job/Test_openjdk24_hs_extended.openjdk_x86-64_mac_testList_3/3/

@adamfarley
Copy link
Contributor Author

adamfarley commented Aug 13, 2024

Date: 2024/08/08
Host: build-marist-rhel8-s390x-1
Error:

21:44:52  Downloading GA release of boot JDK version 23 failed.
21:44:52  Attempting to download EA release of boot JDK version 23 from https://api.adoptium.net/v3/binary/latest/23/ea/linux/s390x/jdk/hotspot/normal/adoptium
21:44:52    % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
21:44:52                                   Dload  Upload   Total   Spent    Left  Speed
21:44:52  
21:44:52    0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
21:44:52    0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
21:44:52  
21:44:52    0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
21:44:52    0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
21:44:52  
21:44:52    5  192M    5 10.0M    0     0  13.9M      0  0:00:13 --:--:--  0:00:13 13.9M
21:44:52  curl: (18) transfer closed with 191282016 bytes remaining to read

URL: https://ci.adoptium.net/job/build-scripts/job/jobs/job/jdk/job/jdk-linux-s390x-temurin/290/

Date: 15 Aug 2024
Host: dockerhost-skytap-ubuntu2004-ppc64le-1
Error:

20:01:25  Attempting to download EA release of boot JDK version 23 from https://api.adoptium.net/v3/binary/latest/23/ea/linux/ppc64le/jdk/hotspot/normal/adoptium
20:01:25    % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
...
20:01:28    9  204M    9 20.0M    0     0  9106k      0  0:00:22  0:00:02  0:00:20 13.9M
20:01:28  curl: (18) transfer closed with 192960112 bytes remaining to read

URL: https://ci.adoptium.net/job/build-scripts/job/jobs/job/jdk/job/jdk-linux-ppc64le-temurin/300/console

@adamfarley
Copy link
Contributor Author

Date: 8 Aug 2024
Host: dockerhost-skytap-ubuntu2204-x64-1
URLs:

20:01:26  + docker pull adoptopenjdk/alpine3_build_image
20:01:26  Using default tag: latest
20:01:26  Error response from daemon: Get "https://registry-1.docker.io/v2/": dial tcp: lookup registry-1.docker.io on 127.0.0.53:53: server misbehaving

The jobs before and after these ones show no sign of this issue. Will ignore for now, and raise a new issue if it occurs again in the future.

@adamfarley
Copy link
Contributor Author

Date: 28 Aug 2024
Host: test-docker-ubuntu2404-armv7-1
URL: https://ci.adoptium.net/job/Test_openjdk17_hs_extended.system_arm_linux/388/
Error:

23:23:58  Cannot contact test-docker-ubuntu2404-armv7-1: java.lang.InterruptedException
23:39:36  wrapper script does not seem to be touching the log file in /home/jenkins/workspace/Test_openjdk17_hs_extended.system_arm_linux@tmp/durable-4bf95685
23:39:36  (JENKINS-48300: if on an extremely laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=86400)

@adamfarley
Copy link
Contributor Author

Date: 28 Aug 2024
Host: test-orka-macos14-x64-s6st9
URL: https://ci.adoptium.net/job/Test_openjdk17_hs_extended.openjdk_x86-64_mac_testList_5/4/
Error:

22:52:29  Cannot contact test-orka-macos14-x64-s6st9: java.lang.InterruptedException

@adamfarley
Copy link
Contributor Author

Date: 2024/10/02
Host: build-marist-rhel8-s390x-1
Error message:

21:07:54  Compiling 30 files for java.security.sasl
21:07:59  Connection attempt failed: Connection refused

URL: https://ci.adoptium.net/job/build-scripts/job/jobs/job/jdk17u/job/jdk17u-linux-s390x-temurin/475/

@sxa
Copy link
Member

sxa commented Oct 9, 2024

21:07:54 Compiling 30 files for java.security.sasl
21:07:59 Connection attempt failed: Connection refused

@adamfarley Do you know what connection that might be? It shouldn't be accessing the network during the build compile step like that 🤔

@adamfarley
Copy link
Contributor Author

adamfarley commented Oct 9, 2024

21:07:54 Compiling 30 files for java.security.sasl
21:07:59 Connection attempt failed: Connection refused

@adamfarley Do you know what connection that might be? It shouldn't be accessing the network during the build compile step like that 🤔

Not off the top of my head. Odd.

Also, even though java.security.sasl is the last module we try to build prior to failure, I think this is java.prefs fault due to the line below:

21:08:18  make/Main.gmk:193: recipe for target 'java.prefs-java' failed

Also, looks like an upstream Mac-aarch64 build also failed with the same retry and timeout values:

https://mail.openjdk.org/pipermail/jdk-updates-dev/2024-April/031687.html

No further information available, though it gives us a place to start if we decide to investigate this. Maybe Goetz Lindenmaier has seen this before, and that's why he's comfortable dismissing the issue.

@adamfarley
Copy link
Contributor Author

Date: 2024/09/26
Host: test-siteox-solaris10u11-sparcv9-1
Error message:

03:42:09  Cannot contact test-siteox-solaris10u11-sparcv9-1: java.lang.InterruptedException

URL: https://ci.adoptium.net/job/Test_openjdk8_hs_extended.openjdk_sparcv9_solaris/155/console

@adamfarley
Copy link
Contributor Author

Date: 2024/10/03
Host: test-macincloud-macos1201-x64-1
Error message:

Cannot contact test-macincloud-macos1201-x64-1: java.lang.InterruptedException

URL: https://ci.adoptium.net/job/Test_openjdk11_hs_sanity.openjdk_x86-64_mac_testList_0/11/console

@adamfarley
Copy link
Contributor Author

Date: 2024/10/05
Host: test-orka-macos14-x64-692n8
Error message:

Cannot contact test-orka-macos14-x64-692n8: java.lang.InterruptedException

URL: https://ci.adoptium.net/job/Test_openjdk23_hs_extended.openjdk_x86-64_mac_testList_4/5/console

@adamfarley
Copy link
Contributor Author

Date: 3 Oct 2024, 09:38:43
Host: test-azure-win2019-x64-1
Error:

11:12:49  Exception: org.jenkinsci.plugins.workflow.support.steps.AgentOfflineException: Unable to create live FilePath for test-azure-win2019-x64-1; test-azure-win2019-x64-1 was marked offline: Connection was broken

URL: https://ci.adoptium.net/job/Test_openjdk11_hs_sanity.openjdk_x86-32_windows_rerun/4/

@adamfarley
Copy link
Contributor Author

Date: 3 Oct 2024, 03:03:51
Host: test-macincloud-macos1201-x64-1
Error:

01:43:50  Cannot contact test-macincloud-macos1201-x64-1: java.lang.InterruptedException

URL: https://ci.adoptium.net/job/Test_openjdk11_hs_sanity.openjdk_x86-64_mac_testList_0/11/

@adamfarley
Copy link
Contributor Author

adamfarley commented Dec 2, 2024

Date: 28 Nov 2024
Host: build-docker-win2022-x64-3-intel
Error:

03:37:31  Cannot contact build-docker-win2022-x64-3-intel: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@4be79c2:JNLP4-connect connection from 74.235.196.238/74.235.196.238:50012": Remote call on JNLP4-connect connection from 74.235.196.238/74.235.196.238:50012 failed. The channel is closing down or has closed down

URL: https://ci.adoptium.net/job/build-scripts/job/jobs/job/jdk8u/job/jdk8u-windows-x64-temurin/532/

Edit: See comment below.

@sxa
Copy link
Member

sxa commented Dec 2, 2024

03:37:31 Cannot contact build-docker-win2022-x64-3-intel:

This is a new machine which is undergoing testing at the moment and so this failure is not part of the general connectivity error situations.

@adamfarley
Copy link
Contributor Author

03:37:31 Cannot contact build-docker-win2022-x64-3-intel:

This is a new machine which is undergoing testing at the moment and so this failure is not part of the general connectivity error situations.

Ah yes, you did warn me about this last week. https://adoptium.slack.com/archives/C09NW3L2J/p1732885038402249

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Todo
Development

No branches or pull requests

3 participants