Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agent doesn't return anything on Jetson Orin Nano (ARM CPU) #196

Open
bogdanr opened this issue Oct 2, 2024 · 20 comments
Open

Agent doesn't return anything on Jetson Orin Nano (ARM CPU) #196

bogdanr opened this issue Oct 2, 2024 · 20 comments
Labels
in progress We've started work on this troubleshooting Maybe bug, maybe not

Comments

@bogdanr
Copy link

bogdanr commented Oct 2, 2024

My agent is stuck on pending on the Jetson Orin Nano.
The issue is not networking. It passes the telnet test and ssh goes past authentication but it doesn't return any statistics.

How I tested. I am running several agents on x86 machines and this one on arm.
I am doing SSH with the key that was generated by teszel, like this:

ssh -i /tmp/id_ed25519 -p 45876 -o "StrictHostKeyChecking=no" 10.10.0.71 -v

On the machine that works I can see a message like this and it exits immediately:

....
Authenticated to 10.10.0.71 ([10.10.0.71]:45876) using "publickey".
debug1: channel 0: new session [client-session] (inactive timeout: 0)
debug1: Entering interactive session.
debug1: pledge: network
debug1: Sending environment.
debug1: channel 0: setting env LANG = "C"
debug1: pledge: fork
PTY allocation request failed on channel 0
{"stats":{"cpu":7.6,"m":31.31,"mu":7.07,"mp":22.57,"mb":3.27,"s":2,"su":0.31,"d":38.29,"du":19.8,"dp":54.54,"dr":0.13,"dw":0.24,"ns":0.11,"nr":0.12,"t":{"coretemp_core_12":55,"coretemp_core_2":56,"coretemp_core_6":55,"coretemp_core_8":53,"coretemp_package_id_0":57,"i350bb_loc1":61}},"info":{"h":"proxmox1","k":"6.8.12-1-pve","c":4,"t":4,"m":"Intel(R) Atom(TM) CPU C3558R @ 2.40GHz","u":813898,"cpu":7.6,"mp":22.57,"dp":54.54,"v":"0.5.0"},"container":[{"n":"beszel-agent","c":0.01,"m":4.21,"ns":0,"nr":0}]}
debug1: client_input_channel_req: channel 0 rtype exit-status reply 0
debug1: channel 0: free: client-session, nchannels 1
Connection to 192.168.0.71 closed.
Transferred: sent 2844, received 2472 bytes, in 0.0 seconds
Bytes per second: sent 339111.3, received 294755.0
debug1: Exit status 0

On the machine that doesn't work, it looks like this and it never exits:

....
Authenticated to 10.10.0.18 ([10.10.0.18]:45876) using "publickey".
debug1: channel 0: new session [client-session] (inactive timeout: 0)
debug1: Entering interactive session.
debug1: pledge: network
debug1: Sending environment.
debug1: channel 0: setting env LANG = "C"
debug1: pledge: fork
PTY allocation request failed on channel 0

The agent that doesn't work produces this log when I start it:

2024/10/02 16:25:22 DEBUG Disk partitions="[{\"device\":\"/dev/nvme0n1p1\",\"mountpoint\":\"/\",\"fstype\":\"ext4\",\"opts\":[\"rw\",\"relatime\"]} {\"device\":\"/dev/nvme0n1p10\",\"mountpoint\":\"/boot/efi\",\"fstype\":\"vfat\",\"opts\":[\"rw\",\"relatime\"]}]"
2024/10/02 16:25:22 DEBUG Disk I/O diskstats="map[loop0:{\"readCount\":176,\"mergedReadCount\":0,\"writeCount\":48,\"mergedWriteCount\":0,\"readBytes\":687616,\"writeBytes\":25088,\"readTime\":22,\"writeTime\":26,\"iopsInProgress\":0,\"ioTime\":32,\"weightedIO\":49,\"name\":\"loop0\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1:{\"readCount\":11242,\"mergedReadCount\":3762,\"writeCount\":26392,\"mergedWriteCount\":61634,\"readBytes\":642634752,\"writeBytes\":2385867776,\"readTime\":4561,\"writeTime\":1095044,\"iopsInProgress\":0,\"ioTime\":59336,\"weightedIO\":1100523,\"name\":\"nvme0n1\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p1:{\"readCount\":10385,\"mergedReadCount\":3755,\"writeCount\":26390,\"mergedWriteCount\":61634,\"readBytes\":619877376,\"writeBytes\":2385866752,\"readTime\":4418,\"writeTime\":1095006,\"iopsInProgress\":0,\"ioTime\":59144,\"weightedIO\":1099425,\"name\":\"nvme0n1p1\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p10:{\"readCount\":158,\"mergedReadCount\":7,\"writeCount\":2,\"mergedWriteCount\":0,\"readBytes\":5562368,\"writeBytes\":1024,\"readTime\":59,\"writeTime\":37,\"iopsInProgress\":0,\"ioTime\":120,\"weightedIO\":97,\"name\":\"nvme0n1p10\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p11:{\"readCount\":62,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":2129920,\"writeBytes\":0,\"readTime\":7,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":20,\"weightedIO\":7,\"name\":\"nvme0n1p11\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p12:{\"readCount\":22,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":90112,\"writeBytes\":0,\"readTime\":1,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":12,\"weightedIO\":1,\"name\":\"nvme0n1p12\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p13:{\"readCount\":62,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":2129920,\"writeBytes\":0,\"readTime\":8,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":20,\"weightedIO\":8,\"name\":\"nvme0n1p13\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p14:{\"readCount\":62,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":2129920,\"writeBytes\":0,\"readTime\":8,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":16,\"weightedIO\":8,\"name\":\"nvme0n1p14\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p15:{\"readCount\":62,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":2129920,\"writeBytes\":0,\"readTime\":6,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":16,\"weightedIO\":6,\"name\":\"nvme0n1p15\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p2:{\"readCount\":62,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":2129920,\"writeBytes\":0,\"readTime\":7,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":20,\"weightedIO\":7,\"name\":\"nvme0n1p2\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p3:{\"readCount\":24,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":98304,\"writeBytes\":0,\"readTime\":1,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":16,\"weightedIO\":1,\"name\":\"nvme0n1p3\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p4:{\"readCount\":48,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":598016,\"writeBytes\":0,\"readTime\":3,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":20,\"weightedIO\":3,\"name\":\"nvme0n1p4\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p5:{\"readCount\":62,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":2129920,\"writeBytes\":0,\"readTime\":8,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":24,\"weightedIO\":8,\"name\":\"nvme0n1p5\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p6:{\"readCount\":24,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":98304,\"writeBytes\":0,\"readTime\":1,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":8,\"weightedIO\":1,\"name\":\"nvme0n1p6\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p7:{\"readCount\":48,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":598016,\"writeBytes\":0,\"readTime\":4,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":12,\"weightedIO\":4,\"name\":\"nvme0n1p7\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p8:{\"readCount\":62,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":2129920,\"writeBytes\":0,\"readTime\":7,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":12,\"weightedIO\":7,\"name\":\"nvme0n1p8\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p9:{\"readCount\":22,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":90112,\"writeBytes\":0,\"readTime\":1,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":12,\"weightedIO\":1,\"name\":\"nvme0n1p9\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} zram0:{\"readCount\":301,\"mergedReadCount\":0,\"writeCount\":1,\"mergedWriteCount\":0,\"readBytes\":1232896,\"writeBytes\":4096,\"readTime\":0,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":12,\"weightedIO\":0,\"name\":\"zram0\",\"serialNumber\":\"\",\"label\":\"\"} zram1:{\"readCount\":301,\"mergedReadCount\":0,\"writeCount\":1,\"mergedWriteCount\":0,\"readBytes\":1232896,\"writeBytes\":4096,\"readTime\":0,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":8,\"weightedIO\":0,\"name\":\"zram1\",\"serialNumber\":\"\",\"label\":\"\"} zram2:{\"readCount\":301,\"mergedReadCount\":0,\"writeCount\":1,\"mergedWriteCount\":0,\"readBytes\":1232896,\"writeBytes\":4096,\"readTime\":0,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":8,\"weightedIO\":0,\"name\":\"zram2\",\"serialNumber\":\"\",\"label\":\"\"} zram3:{\"readCount\":301,\"mergedReadCount\":0,\"writeCount\":1,\"mergedWriteCount\":0,\"readBytes\":1232896,\"writeBytes\":4096,\"readTime\":0,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":12,\"weightedIO\":0,\"name\":\"zram3\",\"serialNumber\":\"\",\"label\":\"\"} zram4:{\"readCount\":301,\"mergedReadCount\":0,\"writeCount\":1,\"mergedWriteCount\":0,\"readBytes\":1232896,\"writeBytes\":4096,\"readTime\":0,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":8,\"weightedIO\":0,\"name\":\"zram4\",\"serialNumber\":\"\",\"label\":\"\"} zram5:{\"readCount\":301,\"mergedReadCount\":0,\"writeCount\":1,\"mergedWriteCount\":0,\"readBytes\":1232896,\"writeBytes\":4096,\"readTime\":0,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":12,\"weightedIO\":0,\"name\":\"zram5\",\"serialNumber\":\"\",\"label\":\"\"}]"
2024/10/02 16:25:22 INFO Detected root device name=nvme0n1p1
2024/10/02 16:25:22 INFO Detected network interface name=enP8p1s0 sent=42496954 recv=1174551053
2024/10/02 16:25:22 INFO Starting SSH server address=:45876
@henrygd
Copy link
Owner

henrygd commented Oct 2, 2024

Interesting, thanks for providing some details.

Does the agent log anything further in debug mode when you attempt to connect to it?

Jetson Nano is 64-bit, right? Are you running the agent with the binary or with Docker? If using Docker, run docker image inspect henrygd/beszel-agent to make sure that it pulled the arm64 image and not arm7.

I have an agent running on an ARM Neoverse machine, so it's not a universal ARM issue.

@henrygd henrygd added the troubleshooting Maybe bug, maybe not label Oct 2, 2024
@henrygd
Copy link
Owner

henrygd commented Oct 3, 2024

The v0.5.1 agent now logs stats on startup if using LOG_LEVEL=debug, so you should see more information.

@bogdanr
Copy link
Author

bogdanr commented Oct 3, 2024

I tried with both the binary and the container. I confirm the container is arm64. Running 0.5.1 produces this output:

[+] Running 1/1
 ✔ Container beszel-agent  Recreated                                                                 0.1s 
Attaching to beszel-agent
beszel-agent  | 2024/10/03 09:26:46 DEBUG Disk partitions="[{\"device\":\"/dev/nvme0n1p1\",\"mountpoint\":\"/etc/resolv.conf\",\"fstype\":\"ext4\",\"opts\":[\"rw\",\"relatime\",\"bind\"]} {\"device\":\"/dev/nvme0n1p1\",\"mountpoint\":\"/etc/hostname\",\"fstype\":\"ext4\",\"opts\":[\"rw\",\"relatime\",\"bind\"]} {\"device\":\"/dev/nvme0n1p1\",\"mountpoint\":\"/etc/hosts\",\"fstype\":\"ext4\",\"opts\":[\"rw\",\"relatime\",\"bind\"]}]"
beszel-agent  | 2024/10/03 09:26:46 DEBUG Disk I/O diskstats="map[loop0:{\"readCount\":176,\"mergedReadCount\":0,\"writeCount\":48,\"mergedWriteCount\":0,\"readBytes\":687616,\"writeBytes\":25088,\"readTime\":22,\"writeTime\":26,\"iopsInProgress\":0,\"ioTime\":32,\"weightedIO\":49,\"name\":\"loop0\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1:{\"readCount\":11968,\"mergedReadCount\":3848,\"writeCount\":148619,\"mergedWriteCount\":301917,\"readBytes\":663602176,\"writeBytes\":6527423488,\"readTime\":8200,\"writeTime\":4497973,\"iopsInProgress\":0,\"ioTime\":952680,\"weightedIO\":4514212,\"name\":\"nvme0n1\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1p1:{\"readCount\":11111,\"mergedReadCount\":3841,\"writeCount\":148617,\"mergedWriteCount\":301917,\"readBytes\":640844800,\"writeBytes\":6527422464,\"readTime\":8058,\"writeTime\":4497936,\"iopsInProgress\":0,\"ioTime\":952488,\"weightedIO\":4505994,\"name\":\"nvme0n1p1\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1p10:{\"readCount\":158,\"mergedReadCount\":7,\"writeCount\":2,\"mergedWriteCount\":0,\"readBytes\":5562368,\"writeBytes\":1024,\"readTime\":59,\"writeTime\":37,\"iopsInProgress\":0,\"ioTime\":120,\"weightedIO\":97,\"name\":\"nvme0n1p10\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1p11:{\"readCount\":62,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":2129920,\"writeBytes\":0,\"readTime\":7,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":20,\"weightedIO\":7,\"name\":\"nvme0n1p11\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1p12:{\"readCount\":22,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":90112,\"writeBytes\":0,\"readTime\":1,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":12,\"weightedIO\":1,\"name\":\"nvme0n1p12\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1p13:{\"readCount\":62,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":2129920,\"writeBytes\":0,\"readTime\":8,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":20,\"weightedIO\":8,\"name\":\"nvme0n1p13\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1p14:{\"readCount\":62,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":2129920,\"writeBytes\":0,\"readTime\":8,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":16,\"weightedIO\":8,\"name\":\"nvme0n1p14\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1p15:{\"readCount\":62,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":2129920,\"writeBytes\":0,\"readTime\":6,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":16,\"weightedIO\":6,\"name\":\"nvme0n1p15\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1p2:{\"readCount\":62,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":2129920,\"writeBytes\":0,\"readTime\":7,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":20,\"weightedIO\":7,\"name\":\"nvme0n1p2\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1p3:{\"readCount\":24,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":98304,\"writeBytes\":0,\"readTime\":1,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":16,\"weightedIO\":1,\"name\":\"nvme0n1p3\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1p4:{\"readCount\":48,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":598016,\"writeBytes\":0,\"readTime\":3,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":20,\"weightedIO\":3,\"name\":\"nvme0n1p4\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1p5:{\"readCount\":62,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":2129920,\"writeBytes\":0,\"readTime\":8,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":24,\"weightedIO\":8,\"name\":\"nvme0n1p5\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1p6:{\"readCount\":24,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":98304,\"writeBytes\":0,\"readTime\":1,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":8,\"weightedIO\":1,\"name\":\"nvme0n1p6\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1p7:{\"readCount\":48,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":598016,\"writeBytes\":0,\"readTime\":4,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":12,\"weightedIO\":4,\"name\":\"nvme0n1p7\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1p8:{\"readCount\":62,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":2129920,\"writeBytes\":0,\"readTime\":7,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":12,\"weightedIO\":7,\"name\":\"nvme0n1p8\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1p9:{\"readCount\":22,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":90112,\"writeBytes\":0,\"readTime\":1,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":12,\"weightedIO\":1,\"name\":\"nvme0n1p9\",\"serialNumber\":\"\",\"label\":\"\"} zram0:{\"readCount\":301,\"mergedReadCount\":0,\"writeCount\":1,\"mergedWriteCount\":0,\"readBytes\":1232896,\"writeBytes\":4096,\"readTime\":0,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":12,\"weightedIO\":0,\"name\":\"zram0\",\"serialNumber\":\"\",\"label\":\"\"} zram1:{\"readCount\":301,\"mergedReadCount\":0,\"writeCount\":1,\"mergedWriteCount\":0,\"readBytes\":1232896,\"writeBytes\":4096,\"readTime\":0,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":8,\"weightedIO\":0,\"name\":\"zram1\",\"serialNumber\":\"\",\"label\":\"\"} zram2:{\"readCount\":301,\"mergedReadCount\":0,\"writeCount\":1,\"mergedWriteCount\":0,\"readBytes\":1232896,\"writeBytes\":4096,\"readTime\":0,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":8,\"weightedIO\":0,\"name\":\"zram2\",\"serialNumber\":\"\",\"label\":\"\"} zram3:{\"readCount\":301,\"mergedReadCount\":0,\"writeCount\":1,\"mergedWriteCount\":0,\"readBytes\":1232896,\"writeBytes\":4096,\"readTime\":0,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":12,\"weightedIO\":0,\"name\":\"zram3\",\"serialNumber\":\"\",\"label\":\"\"} zram4:{\"readCount\":301,\"mergedReadCount\":0,\"writeCount\":1,\"mergedWriteCount\":0,\"readBytes\":1232896,\"writeBytes\":4096,\"readTime\":0,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":8,\"weightedIO\":0,\"name\":\"zram4\",\"serialNumber\":\"\",\"label\":\"\"} zram5:{\"readCount\":301,\"mergedReadCount\":0,\"writeCount\":1,\"mergedWriteCount\":0,\"readBytes\":1232896,\"writeBytes\":4096,\"readTime\":0,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":12,\"weightedIO\":0,\"name\":\"zram5\",\"serialNumber\":\"\",\"label\":\"\"}]"
beszel-agent  | 2024/10/03 09:26:46 INFO Detected root device name=nvme0n1p1
beszel-agent  | 2024/10/03 09:26:46 INFO Detected network interface name=enP8p1s0 sent=1088397427 recv=11679403798
beszel-agent  | 2024/10/03 09:26:46 DEBUG Docker version=27.3.1 concurrency=5

@henrygd
Copy link
Owner

henrygd commented Oct 3, 2024

Thanks, it seems to be failing silently while trying to gather the metrics. I'll look further into how that could happen and make sure something is logged.

@henrygd
Copy link
Owner

henrygd commented Oct 10, 2024

I added more debug logs in 0.5.3, so if you run this again we should be able to see more info about where it's breaking down.

@bogdanr
Copy link
Author

bogdanr commented Oct 11, 2024

Latest logs are here:

beszel-agent  | 2024/10/11 10:07:33 DEBUG Not monitoring ZFS ARC err="open /proc/spl/kstat/zfs/arcstats: no such file or directory"
beszel-agent  | 2024/10/11 10:07:33 DEBUG Disk partitions="[{\"device\":\"/dev/nvme0n1p1\",\"mountpoint\":\"/etc/resolv.conf\",\"fstype\":\"ext4\",\"opts\":[\"rw\",\"relatime\",\"bind\"]} {\"device\":\"/dev/nvme0n1p1\",\"mountpoint\":\"/etc/hostname\",\"fstype\":\"ext4\",\"opts\":[\"rw\",\"relatime\",\"bind\"]} {\"device\":\"/dev/nvme0n1p1\",\"mountpoint\":\"/etc/hosts\",\"fstype\":\"ext4\",\"opts\":[\"rw\",\"relatime\",\"bind\"]}]"
beszel-agent  | 2024/10/11 10:07:33 DEBUG Disk I/O diskstats="map[loop0:{\"readCount\":115,\"mergedReadCount\":0,\"writeCount\":55,\"mergedWriteCount\":0,\"readBytes\":688128,\"writeBytes\":28672,\"readTime\":4,\"writeTime\":7,\"iopsInProgress\":0,\"ioTime\":24,\"weightedIO\":12,\"name\":\"loop0\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1:{\"readCount\":409661,\"mergedReadCount\":13793,\"writeCount\":2340362,\"mergedWriteCount\":3618817,\"readBytes\":37213258752,\"writeBytes\":155948647424,\"readTime\":258805,\"writeTime\":72283839,\"iopsInProgress\":4,\"ioTime\":9555808,\"weightedIO\":72672491,\"name\":\"nvme0n1\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1p1:{\"readCount\":408804,\"mergedReadCount\":13786,\"writeCount\":2340359,\"mergedWriteCount\":3618817,\"readBytes\":37190501376,\"writeBytes\":155948642304,\"readTime\":258685,\"writeTime\":72283801,\"iopsInProgress\":4,\"ioTime\":9555684,\"weightedIO\":72593855,\"name\":\"nvme0n1p1\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1p10:{\"readCount\":158,\"mergedReadCount\":7,\"writeCount\":3,\"mergedWriteCount\":0,\"readBytes\":5562368,\"writeBytes\":5120,\"readTime\":23,\"writeTime\":37,\"iopsInProgress\":0,\"ioTime\":120,\"weightedIO\":67,\"name\":\"nvme0n1p10\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1p11:{\"readCount\":62,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":2129920,\"writeBytes\":0,\"readTime\":9,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":24,\"weightedIO\":9,\"name\":\"nvme0n1p11\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1p12:{\"readCount\":22,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":90112,\"writeBytes\":0,\"readTime\":1,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":12,\"weightedIO\":1,\"name\":\"nvme0n1p12\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1p13:{\"readCount\":62,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":2129920,\"writeBytes\":0,\"readTime\":15,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":20,\"weightedIO\":15,\"name\":\"nvme0n1p13\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1p14:{\"readCount\":62,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":2129920,\"writeBytes\":0,\"readTime\":8,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":20,\"weightedIO\":8,\"name\":\"nvme0n1p14\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1p15:{\"readCount\":62,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":2129920,\"writeBytes\":0,\"readTime\":7,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":20,\"weightedIO\":7,\"name\":\"nvme0n1p15\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1p2:{\"readCount\":62,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":2129920,\"writeBytes\":0,\"readTime\":7,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":16,\"weightedIO\":7,\"name\":\"nvme0n1p2\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1p3:{\"readCount\":24,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":98304,\"writeBytes\":0,\"readTime\":1,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":12,\"weightedIO\":1,\"name\":\"nvme0n1p3\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1p4:{\"readCount\":48,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":598016,\"writeBytes\":0,\"readTime\":4,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":12,\"weightedIO\":4,\"name\":\"nvme0n1p4\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1p5:{\"readCount\":62,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":2129920,\"writeBytes\":0,\"readTime\":8,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":24,\"weightedIO\":8,\"name\":\"nvme0n1p5\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1p6:{\"readCount\":24,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":98304,\"writeBytes\":0,\"readTime\":1,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":12,\"weightedIO\":1,\"name\":\"nvme0n1p6\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1p7:{\"readCount\":48,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":598016,\"writeBytes\":0,\"readTime\":5,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":12,\"weightedIO\":5,\"name\":\"nvme0n1p7\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1p8:{\"readCount\":62,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":2129920,\"writeBytes\":0,\"readTime\":8,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":20,\"weightedIO\":8,\"name\":\"nvme0n1p8\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1p9:{\"readCount\":22,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":90112,\"writeBytes\":0,\"readTime\":1,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":8,\"weightedIO\":1,\"name\":\"nvme0n1p9\",\"serialNumber\":\"\",\"label\":\"\"} zram0:{\"readCount\":742,\"mergedReadCount\":0,\"writeCount\":11297,\"mergedWriteCount\":0,\"readBytes\":3039232,\"writeBytes\":46272512,\"readTime\":4,\"writeTime\":176,\"iopsInProgress\":0,\"ioTime\":1908,\"weightedIO\":180,\"name\":\"zram0\",\"serialNumber\":\"\",\"label\":\"\"} zram1:{\"readCount\":626,\"mergedReadCount\":0,\"writeCount\":11319,\"mergedWriteCount\":0,\"readBytes\":2564096,\"writeBytes\":46362624,\"readTime\":12,\"writeTime\":180,\"iopsInProgress\":0,\"ioTime\":1852,\"weightedIO\":192,\"name\":\"zram1\",\"serialNumber\":\"\",\"label\":\"\"} zram2:{\"readCount\":643,\"mergedReadCount\":0,\"writeCount\":11265,\"mergedWriteCount\":0,\"readBytes\":2633728,\"writeBytes\":46141440,\"readTime\":4,\"writeTime\":200,\"iopsInProgress\":0,\"ioTime\":1876,\"weightedIO\":204,\"name\":\"zram2\",\"serialNumber\":\"\",\"label\":\"\"} zram3:{\"readCount\":652,\"mergedReadCount\":0,\"writeCount\":11238,\"mergedWriteCount\":0,\"readBytes\":2670592,\"writeBytes\":46030848,\"readTime\":0,\"writeTime\":156,\"iopsInProgress\":0,\"ioTime\":1960,\"weightedIO\":156,\"name\":\"zram3\",\"serialNumber\":\"\",\"label\":\"\"} zram4:{\"readCount\":650,\"mergedReadCount\":0,\"writeCount\":11195,\"mergedWriteCount\":0,\"readBytes\":2662400,\"writeBytes\":45854720,\"readTime\":0,\"writeTime\":224,\"iopsInProgress\":0,\"ioTime\":2100,\"weightedIO\":224,\"name\":\"zram4\",\"serialNumber\":\"\",\"label\":\"\"} zram5:{\"readCount\":731,\"mergedReadCount\":0,\"writeCount\":11243,\"mergedWriteCount\":0,\"readBytes\":2994176,\"writeBytes\":46051328,\"readTime\":8,\"writeTime\":212,\"iopsInProgress\":0,\"ioTime\":1908,\"weightedIO\":220,\"name\":\"zram5\",\"serialNumber\":\"\",\"label\":\"\"}]"
beszel-agent  | 2024/10/11 10:07:33 INFO Detected root device name=nvme0n1p1
beszel-agent  | 2024/10/11 10:07:33 INFO Detected network interface name=enP8p1s0 sent=5684325125 recv=129954163148
beszel-agent  | 2024/10/11 10:07:33 DEBUG Docker version=27.3.1 concurrency=5
beszel-agent  | 2024/10/11 10:07:33 DEBUG Getting stats
beszel-agent  | 2024/10/11 10:07:33 DEBUG Getting cpu percent
beszel-agent  | 2024/10/11 10:07:33 DEBUG Getting memory stats
beszel-agent  | 2024/10/11 10:07:33 DEBUG Getting disk stats
beszel-agent  | 2024/10/11 10:07:33 DEBUG Getting disk I/O stats
beszel-agent  | 2024/10/11 10:07:33 DEBUG Getting network stats
beszel-agent  | 2024/10/11 10:07:33 DEBUG Getting temperatures

@henrygd
Copy link
Owner

henrygd commented Oct 16, 2024

I wasn't able to figure out what's going wrong, so I changed the debug logs around in 0.6.0.

It probably has something to do with Docker. You can try not mounting the socket, or passing a dummy value like tcp://localhost:81 in as DOCKER_HOST to see if it returns the system stats successfully.

@bogdanr
Copy link
Author

bogdanr commented Oct 17, 2024

I'd say the behavior is different now. Instead of pending the node appears to be down and nothing is listening to port 45876. The logs follow:

[+] Running 1/0
 ✔ Container beszel-agent  Recreated                                                                                                                                                                            0.1s 
Attaching to beszel-agent
beszel-agent  | 2024/10/17 08:03:18 DEBUG Not monitoring ZFS ARC err="open /proc/spl/kstat/zfs/arcstats: no such file or directory"
beszel-agent  | 2024/10/17 08:03:18 DEBUG Disk partitions="[{\"device\":\"/dev/nvme0n1p1\",\"mountpoint\":\"/etc/resolv.conf\",\"fstype\":\"ext4\",\"opts\":[\"rw\",\"relatime\",\"bind\"]} {\"device\":\"/dev/nvme0n1p1\",\"mountpoint\":\"/etc/hostname\",\"fstype\":\"ext4\",\"opts\":[\"rw\",\"relatime\",\"bind\"]} {\"device\":\"/dev/nvme0n1p1\",\"mountpoint\":\"/etc/hosts\",\"fstype\":\"ext4\",\"opts\":[\"rw\",\"relatime\",\"bind\"]}]"
beszel-agent  | 2024/10/17 08:03:18 DEBUG Disk I/O diskstats="map[loop0:{\"readCount\":115,\"mergedReadCount\":0,\"writeCount\":55,\"mergedWriteCount\":0,\"readBytes\":688128,\"writeBytes\":28672,\"readTime\":4,\"writeTime\":7,\"iopsInProgress\":0,\"ioTime\":24,\"weightedIO\":12,\"name\":\"loop0\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1:{\"readCount\":743525,\"mergedReadCount\":36403,\"writeCount\":3995702,\"mergedWriteCount\":6544305,\"readBytes\":54794056704,\"writeBytes\":223101748224,\"readTime\":592128,\"writeTime\":119683114,\"iopsInProgress\":0,\"ioTime\":18654268,\"weightedIO\":120578652,\"name\":\"nvme0n1\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1p1:{\"readCount\":742402,\"mergedReadCount\":35590,\"writeCount\":3995699,\"mergedWriteCount\":6544305,\"readBytes\":54770091008,\"writeBytes\":223101743104,\"readTime\":591919,\"writeTime\":119683076,\"iopsInProgress\":0,\"ioTime\":18654132,\"weightedIO\":120387105,\"name\":\"nvme0n1p1\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1p10:{\"readCount\":352,\"mergedReadCount\":813,\"writeCount\":3,\"mergedWriteCount\":0,\"readBytes\":6074368,\"writeBytes\":5120,\"readTime\":97,\"writeTime\":37,\"iopsInProgress\":0,\"ioTime\":132,\"weightedIO\":149,\"name\":\"nvme0n1p10\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1p11:{\"readCount\":62,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":2129920,\"writeBytes\":0,\"readTime\":9,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":24,\"weightedIO\":9,\"name\":\"nvme0n1p11\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1p12:{\"readCount\":22,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":90112,\"writeBytes\":0,\"readTime\":1,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":12,\"weightedIO\":1,\"name\":\"nvme0n1p12\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1p13:{\"readCount\":62,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":2129920,\"writeBytes\":0,\"readTime\":15,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":20,\"weightedIO\":15,\"name\":\"nvme0n1p13\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1p14:{\"readCount\":62,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":2129920,\"writeBytes\":0,\"readTime\":8,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":20,\"weightedIO\":8,\"name\":\"nvme0n1p14\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1p15:{\"readCount\":62,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":2129920,\"writeBytes\":0,\"readTime\":7,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":20,\"weightedIO\":7,\"name\":\"nvme0n1p15\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1p2:{\"readCount\":62,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":2129920,\"writeBytes\":0,\"readTime\":7,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":16,\"weightedIO\":7,\"name\":\"nvme0n1p2\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1p3:{\"readCount\":24,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":98304,\"writeBytes\":0,\"readTime\":1,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":12,\"weightedIO\":1,\"name\":\"nvme0n1p3\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1p4:{\"readCount\":48,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":598016,\"writeBytes\":0,\"readTime\":4,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":12,\"weightedIO\":4,\"name\":\"nvme0n1p4\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1p5:{\"readCount\":62,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":2129920,\"writeBytes\":0,\"readTime\":8,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":24,\"weightedIO\":8,\"name\":\"nvme0n1p5\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1p6:{\"readCount\":24,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":98304,\"writeBytes\":0,\"readTime\":1,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":12,\"weightedIO\":1,\"name\":\"nvme0n1p6\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1p7:{\"readCount\":48,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":598016,\"writeBytes\":0,\"readTime\":5,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":12,\"weightedIO\":5,\"name\":\"nvme0n1p7\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1p8:{\"readCount\":62,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":2129920,\"writeBytes\":0,\"readTime\":8,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":20,\"weightedIO\":8,\"name\":\"nvme0n1p8\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1p9:{\"readCount\":22,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":90112,\"writeBytes\":0,\"readTime\":1,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":8,\"weightedIO\":1,\"name\":\"nvme0n1p9\",\"serialNumber\":\"\",\"label\":\"\"} zram0:{\"readCount\":13635,\"mergedReadCount\":0,\"writeCount\":67294,\"mergedWriteCount\":0,\"readBytes\":55848960,\"writeBytes\":275636224,\"readTime\":88,\"writeTime\":828,\"iopsInProgress\":0,\"ioTime\":12492,\"weightedIO\":916,\"name\":\"zram0\",\"serialNumber\":\"\",\"label\":\"\"} zram1:{\"readCount\":13155,\"mergedReadCount\":0,\"writeCount\":67225,\"mergedWriteCount\":0,\"readBytes\":53882880,\"writeBytes\":275353600,\"readTime\":72,\"writeTime\":864,\"iopsInProgress\":0,\"ioTime\":12636,\"weightedIO\":936,\"name\":\"zram1\",\"serialNumber\":\"\",\"label\":\"\"} zram2:{\"readCount\":13497,\"mergedReadCount\":0,\"writeCount\":67141,\"mergedWriteCount\":0,\"readBytes\":55283712,\"writeBytes\":275009536,\"readTime\":84,\"writeTime\":840,\"iopsInProgress\":0,\"ioTime\":12248,\"weightedIO\":924,\"name\":\"zram2\",\"serialNumber\":\"\",\"label\":\"\"} zram3:{\"readCount\":13653,\"mergedReadCount\":0,\"writeCount\":67215,\"mergedWriteCount\":0,\"readBytes\":55922688,\"writeBytes\":275312640,\"readTime\":44,\"writeTime\":836,\"iopsInProgress\":0,\"ioTime\":13080,\"weightedIO\":880,\"name\":\"zram3\",\"serialNumber\":\"\",\"label\":\"\"} zram4:{\"readCount\":13483,\"mergedReadCount\":0,\"writeCount\":67130,\"mergedWriteCount\":0,\"readBytes\":55226368,\"writeBytes\":274964480,\"readTime\":92,\"writeTime\":980,\"iopsInProgress\":0,\"ioTime\":13080,\"weightedIO\":1072,\"name\":\"zram4\",\"serialNumber\":\"\",\"label\":\"\"} zram5:{\"readCount\":13124,\"mergedReadCount\":0,\"writeCount\":67186,\"mergedWriteCount\":0,\"readBytes\":53755904,\"writeBytes\":275193856,\"readTime\":92,\"writeTime\":908,\"iopsInProgress\":0,\"ioTime\":11908,\"weightedIO\":1000,\"name\":\"zram5\",\"serialNumber\":\"\",\"label\":\"\"}]"
beszel-agent  | 2024/10/17 08:03:18 INFO Detected root device name=nvme0n1p1
beszel-agent  | 2024/10/17 08:03:18 INFO Detected network interface name=enP8p1s0 sent=11065393723 recv=217956126037
beszel-agent  | 2024/10/17 08:03:18 INFO DOCKER_HOST host=tcp://localhost:81
beszel-agent  | 2024/10/17 08:03:18 DEBUG Getting stats

And here the same log when not running via docker:

root@jetson:~# PORT=45876 KEY="ssh-ed25519 AAAA....dcDF" LOG_LEVEL=debug ./beszel-agent
2024/10/17 11:13:16 DEBUG Not monitoring ZFS ARC err="open /proc/spl/kstat/zfs/arcstats: no such file or directory"
2024/10/17 11:13:16 DEBUG Disk partitions="[{\"device\":\"/dev/nvme0n1p1\",\"mountpoint\":\"/\",\"fstype\":\"ext4\",\"opts\":[\"rw\",\"relatime\"]} {\"device\":\"/dev/nvme0n1p10\",\"mountpoint\":\"/boot/efi\",\"fstype\":\"vfat\",\"opts\":[\"rw\",\"relatime\"]}]"
2024/10/17 11:13:16 DEBUG Disk I/O diskstats="map[loop0:{\"readCount\":115,\"mergedReadCount\":0,\"writeCount\":55,\"mergedWriteCount\":0,\"readBytes\":688128,\"writeBytes\":28672,\"readTime\":4,\"writeTime\":7,\"iopsInProgress\":0,\"ioTime\":24,\"weightedIO\":12,\"name\":\"loop0\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1:{\"readCount\":743672,\"mergedReadCount\":36447,\"writeCount\":3998926,\"mergedWriteCount\":6549215,\"readBytes\":54798357504,\"writeBytes\":223196771328,\"readTime\":593116,\"writeTime\":119748327,\"iopsInProgress\":0,\"ioTime\":18665260,\"weightedIO\":120645326,\"name\":\"nvme0n1\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p1:{\"readCount\":742549,\"mergedReadCount\":35634,\"writeCount\":3998923,\"mergedWriteCount\":6549215,\"readBytes\":54774391808,\"writeBytes\":223196766208,\"readTime\":592907,\"writeTime\":119748289,\"iopsInProgress\":0,\"ioTime\":18665124,\"weightedIO\":120453307,\"name\":\"nvme0n1p1\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p10:{\"readCount\":352,\"mergedReadCount\":813,\"writeCount\":3,\"mergedWriteCount\":0,\"readBytes\":6074368,\"writeBytes\":5120,\"readTime\":97,\"writeTime\":37,\"iopsInProgress\":0,\"ioTime\":132,\"weightedIO\":149,\"name\":\"nvme0n1p10\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p11:{\"readCount\":62,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":2129920,\"writeBytes\":0,\"readTime\":9,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":24,\"weightedIO\":9,\"name\":\"nvme0n1p11\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p12:{\"readCount\":22,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":90112,\"writeBytes\":0,\"readTime\":1,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":12,\"weightedIO\":1,\"name\":\"nvme0n1p12\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p13:{\"readCount\":62,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":2129920,\"writeBytes\":0,\"readTime\":15,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":20,\"weightedIO\":15,\"name\":\"nvme0n1p13\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p14:{\"readCount\":62,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":2129920,\"writeBytes\":0,\"readTime\":8,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":20,\"weightedIO\":8,\"name\":\"nvme0n1p14\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p15:{\"readCount\":62,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":2129920,\"writeBytes\":0,\"readTime\":7,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":20,\"weightedIO\":7,\"name\":\"nvme0n1p15\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p2:{\"readCount\":62,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":2129920,\"writeBytes\":0,\"readTime\":7,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":16,\"weightedIO\":7,\"name\":\"nvme0n1p2\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p3:{\"readCount\":24,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":98304,\"writeBytes\":0,\"readTime\":1,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":12,\"weightedIO\":1,\"name\":\"nvme0n1p3\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p4:{\"readCount\":48,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":598016,\"writeBytes\":0,\"readTime\":4,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":12,\"weightedIO\":4,\"name\":\"nvme0n1p4\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p5:{\"readCount\":62,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":2129920,\"writeBytes\":0,\"readTime\":8,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":24,\"weightedIO\":8,\"name\":\"nvme0n1p5\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p6:{\"readCount\":24,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":98304,\"writeBytes\":0,\"readTime\":1,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":12,\"weightedIO\":1,\"name\":\"nvme0n1p6\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p7:{\"readCount\":48,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":598016,\"writeBytes\":0,\"readTime\":5,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":12,\"weightedIO\":5,\"name\":\"nvme0n1p7\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p8:{\"readCount\":62,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":2129920,\"writeBytes\":0,\"readTime\":8,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":20,\"weightedIO\":8,\"name\":\"nvme0n1p8\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p9:{\"readCount\":22,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":90112,\"writeBytes\":0,\"readTime\":1,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":8,\"weightedIO\":1,\"name\":\"nvme0n1p9\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} zram0:{\"readCount\":13637,\"mergedReadCount\":0,\"writeCount\":67294,\"mergedWriteCount\":0,\"readBytes\":55857152,\"writeBytes\":275636224,\"readTime\":88,\"writeTime\":828,\"iopsInProgress\":0,\"ioTime\":12500,\"weightedIO\":916,\"name\":\"zram0\",\"serialNumber\":\"\",\"label\":\"\"} zram1:{\"readCount\":13155,\"mergedReadCount\":0,\"writeCount\":67225,\"mergedWriteCount\":0,\"readBytes\":53882880,\"writeBytes\":275353600,\"readTime\":72,\"writeTime\":864,\"iopsInProgress\":0,\"ioTime\":12636,\"weightedIO\":936,\"name\":\"zram1\",\"serialNumber\":\"\",\"label\":\"\"} zram2:{\"readCount\":13497,\"mergedReadCount\":0,\"writeCount\":67141,\"mergedWriteCount\":0,\"readBytes\":55283712,\"writeBytes\":275009536,\"readTime\":84,\"writeTime\":840,\"iopsInProgress\":0,\"ioTime\":12248,\"weightedIO\":924,\"name\":\"zram2\",\"serialNumber\":\"\",\"label\":\"\"} zram3:{\"readCount\":13654,\"mergedReadCount\":0,\"writeCount\":67215,\"mergedWriteCount\":0,\"readBytes\":55926784,\"writeBytes\":275312640,\"readTime\":44,\"writeTime\":836,\"iopsInProgress\":0,\"ioTime\":13084,\"weightedIO\":880,\"name\":\"zram3\",\"serialNumber\":\"\",\"label\":\"\"} zram4:{\"readCount\":13483,\"mergedReadCount\":0,\"writeCount\":67130,\"mergedWriteCount\":0,\"readBytes\":55226368,\"writeBytes\":274964480,\"readTime\":92,\"writeTime\":980,\"iopsInProgress\":0,\"ioTime\":13080,\"weightedIO\":1072,\"name\":\"zram4\",\"serialNumber\":\"\",\"label\":\"\"} zram5:{\"readCount\":13125,\"mergedReadCount\":0,\"writeCount\":67186,\"mergedWriteCount\":0,\"readBytes\":53760000,\"writeBytes\":275193856,\"readTime\":92,\"writeTime\":908,\"iopsInProgress\":0,\"ioTime\":11912,\"weightedIO\":1000,\"name\":\"zram5\",\"serialNumber\":\"\",\"label\":\"\"}]"
2024/10/17 11:13:16 INFO Detected root device name=nvme0n1p1
2024/10/17 11:13:16 INFO Detected network interface name=enP8p1s0 sent=11071452712 recv=218105508615
2024/10/17 11:13:16 DEBUG Docker version=27.3.1 concurrency=5
2024/10/17 11:13:16 DEBUG Getting stats

and here some strace snippet:

2024/10/17 11:15:53 INFO Detected root device name=nvme0n1p1
2024/10/17 11:15:53 INFO Detected network interface name=enP8p1s0 sent=11073208683 recv=218147848021
) = 0
futex(0x6446e0, FUTEX_WAIT_PRIVATE, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable)
futex(0x6446e0, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
epoll_pwait(4, 2024/10/17 11:15:53 DEBUG Docker version=27.3.1 concurrency=5
2024/10/17 11:15:53 DEBUG Getting stats
[{events=EPOLLIN|EPOLLOUT, data={u32=1029177345, u64=18446557505654358017}}], 128, 7987, NULL, 0) = 1
epoll_pwait(4, [], 128, 0, NULL, 0)     = 0
epoll_pwait(4, [{events=EPOLLIN|EPOLLOUT, data={u32=1029177346, u64=18446557505654358018}}], 128, 7987, NULL, 0) = 1
epoll_pwait(4, [], 128, 0, NULL, 0)     = 0
epoll_pwait(4, [{events=EPOLLIN|EPOLLOUT, data={u32=1029177347, u64=18446557505654358019}}], 128, 7987, NULL, 0) = 1
epoll_pwait(4, [], 128, 0, NULL, 0)     = 0
futex(0x6446e0, FUTEX_WAIT_PRIVATE, 0, NULL

@henrygd
Copy link
Owner

henrygd commented Oct 20, 2024

Very strange issue. Doesn't appear to be related to Docker -- it's deadlocking before even getting to that.

I can't see how it's happening, but I made a small change and moved the debug logs around again in 0.6.1 to try to narrow down the possibilities.

@bogdanr
Copy link
Author

bogdanr commented Oct 21, 2024

Latest logs here from running this binary:

2024/10/21 11:11:29 DEBUG 0.6.1
2024/10/21 11:11:29 DEBUG Not monitoring ZFS ARC err="open /proc/spl/kstat/zfs/arcstats: no such file or directory"
2024/10/21 11:11:29 DEBUG Disk partitions="[{\"device\":\"/dev/nvme0n1p1\",\"mountpoint\":\"/\",\"fstype\":\"ext4\",\"opts\":[\"rw\",\"relatime\"]} {\"device\":\"/dev/nvme0n1p10\",\"mountpoint\":\"/boot/efi\",\"fstype\":\"vfat\",\"opts\":[\"rw\",\"relatime\"]}]"
2024/10/21 11:11:29 DEBUG Disk I/O diskstats="map[loop0:{\"readCount\":176,\"mergedReadCount\":0,\"writeCount\":55,\"mergedWriteCount\":0,\"readBytes\":687616,\"writeBytes\":28672,\"readTime\":76,\"writeTime\":18,\"iopsInProgress\":0,\"ioTime\":96,\"weightedIO\":95,\"name\":\"loop0\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1:{\"readCount\":94325,\"mergedReadCount\":16861,\"writeCount\":581395,\"mergedWriteCount\":1083173,\"readBytes\":2401747968,\"writeBytes\":14503425024,\"readTime\":74517,\"writeTime\":15703135,\"iopsInProgress\":0,\"ioTime\":4091100,\"weightedIO\":15886572,\"name\":\"nvme0n1\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p1:{\"readCount\":93468,\"mergedReadCount\":16854,\"writeCount\":581392,\"mergedWriteCount\":1083173,\"readBytes\":2378990592,\"writeBytes\":14503419904,\"readTime\":73161,\"writeTime\":15703098,\"iopsInProgress\":0,\"ioTime\":4090256,\"weightedIO\":15852011,\"name\":\"nvme0n1p1\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p10:{\"readCount\":158,\"mergedReadCount\":7,\"writeCount\":3,\"mergedWriteCount\":0,\"readBytes\":5562368,\"writeBytes\":5120,\"readTime\":255,\"writeTime\":37,\"iopsInProgress\":0,\"ioTime\":304,\"weightedIO\":304,\"name\":\"nvme0n1p10\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p11:{\"readCount\":62,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":2129920,\"writeBytes\":0,\"readTime\":43,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":40,\"weightedIO\":43,\"name\":\"nvme0n1p11\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p12:{\"readCount\":22,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":90112,\"writeBytes\":0,\"readTime\":28,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":36,\"weightedIO\":28,\"name\":\"nvme0n1p12\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p13:{\"readCount\":62,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":2129920,\"writeBytes\":0,\"readTime\":163,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":144,\"weightedIO\":163,\"name\":\"nvme0n1p13\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p14:{\"readCount\":62,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":2129920,\"writeBytes\":0,\"readTime\":91,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":92,\"weightedIO\":91,\"name\":\"nvme0n1p14\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p15:{\"readCount\":62,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":2129920,\"writeBytes\":0,\"readTime\":108,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":108,\"weightedIO\":108,\"name\":\"nvme0n1p15\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p2:{\"readCount\":62,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":2129920,\"writeBytes\":0,\"readTime\":188,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":172,\"weightedIO\":188,\"name\":\"nvme0n1p2\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p3:{\"readCount\":24,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":98304,\"writeBytes\":0,\"readTime\":7,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":20,\"weightedIO\":7,\"name\":\"nvme0n1p3\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p4:{\"readCount\":48,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":598016,\"writeBytes\":0,\"readTime\":33,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":56,\"weightedIO\":33,\"name\":\"nvme0n1p4\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p5:{\"readCount\":62,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":2129920,\"writeBytes\":0,\"readTime\":77,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":88,\"weightedIO\":77,\"name\":\"nvme0n1p5\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p6:{\"readCount\":24,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":98304,\"writeBytes\":0,\"readTime\":60,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":76,\"weightedIO\":60,\"name\":\"nvme0n1p6\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p7:{\"readCount\":48,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":598016,\"writeBytes\":0,\"readTime\":34,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":44,\"weightedIO\":34,\"name\":\"nvme0n1p7\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p8:{\"readCount\":62,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":2129920,\"writeBytes\":0,\"readTime\":186,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":180,\"weightedIO\":186,\"name\":\"nvme0n1p8\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p9:{\"readCount\":22,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":90112,\"writeBytes\":0,\"readTime\":3,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":12,\"weightedIO\":3,\"name\":\"nvme0n1p9\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} zram0:{\"readCount\":301,\"mergedReadCount\":0,\"writeCount\":1,\"mergedWriteCount\":0,\"readBytes\":1232896,\"writeBytes\":4096,\"readTime\":4,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":12,\"weightedIO\":4,\"name\":\"zram0\",\"serialNumber\":\"\",\"label\":\"\"} zram1:{\"readCount\":301,\"mergedReadCount\":0,\"writeCount\":1,\"mergedWriteCount\":0,\"readBytes\":1232896,\"writeBytes\":4096,\"readTime\":0,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":12,\"weightedIO\":0,\"name\":\"zram1\",\"serialNumber\":\"\",\"label\":\"\"} zram2:{\"readCount\":301,\"mergedReadCount\":0,\"writeCount\":1,\"mergedWriteCount\":0,\"readBytes\":1232896,\"writeBytes\":4096,\"readTime\":0,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":8,\"weightedIO\":0,\"name\":\"zram2\",\"serialNumber\":\"\",\"label\":\"\"} zram3:{\"readCount\":301,\"mergedReadCount\":0,\"writeCount\":1,\"mergedWriteCount\":0,\"readBytes\":1232896,\"writeBytes\":4096,\"readTime\":4,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":12,\"weightedIO\":4,\"name\":\"zram3\",\"serialNumber\":\"\",\"label\":\"\"} zram4:{\"readCount\":301,\"mergedReadCount\":0,\"writeCount\":1,\"mergedWriteCount\":0,\"readBytes\":1232896,\"writeBytes\":4096,\"readTime\":0,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":8,\"weightedIO\":0,\"name\":\"zram4\",\"serialNumber\":\"\",\"label\":\"\"} zram5:{\"readCount\":301,\"mergedReadCount\":0,\"writeCount\":1,\"mergedWriteCount\":0,\"readBytes\":1232896,\"writeBytes\":4096,\"readTime\":0,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":8,\"weightedIO\":0,\"name\":\"zram5\",\"serialNumber\":\"\",\"label\":\"\"}]"
2024/10/21 11:11:29 INFO Detected root device name=nvme0n1p1
2024/10/21 11:11:29 INFO Detected network interface name=enP8p1s0 sent=1761129829 recv=32461695461
2024/10/21 11:11:29 DEBUG Docker version=27.3.1 concurrency=5
2024/10/21 11:11:29 DEBUG Getting stats

@henrygd
Copy link
Owner

henrygd commented Oct 21, 2024

I think it's somehow deadlocking in gopsutil's sensors.TemperaturesWithContext.

Can you please run this this Go code to see if it locks up as well? If you don't want to build yourself, you can download a linux arm64 binary here: https://assets.henrygd.me/beszel/bin/temps-linux-arm64

wget https://assets.henrygd.me/beszel/bin/temps-linux-arm64 && chmod +x ./temps-linux-arm64 && ./temps-linux-arm64
package main

import (
	"context"
	"fmt"

	"github.com/shirou/gopsutil/v4/sensors"
)

func main() {
	fmt.Println("Before reading temperatures")
	temperatures, err := sensors.TemperaturesWithContext(context.Background())
	fmt.Println("After reading temperatures")
	if err != nil {
		err.(*sensors.Warnings).Verbose = true
		panic(err)
	}
	for _, temp := range temperatures {
		fmt.Printf("%s: %.1f\n", temp.SensorKey, temp.Temperature)
	}
}

@bogdanr
Copy link
Author

bogdanr commented Oct 21, 2024

Sure. It looks like it is:

root@jetson:~# ./temps-linux-arm64 
Before reading temperatures

Here it is the last part of strace:

...
getdents64(3, 0x40000c0000 /* 24 entries */, 8192) = 928
getdents64(3, 0x40000c0000 /* 0 entries */, 8192) = 0
close(3)                                = 0
openat(AT_FDCWD, "/sys/class/thermal/thermal_zone0/type", O_RDONLY|O_CLOEXEC) = 3
fcntl(3, F_GETFL)                       = 0x20000 (flags O_RDONLY|O_LARGEFILE)
fcntl(3, F_SETFL, O_RDONLY|O_NONBLOCK|O_LARGEFILE) = 0
epoll_ctl(4, EPOLL_CTL_ADD, 3, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data={u32=237502480, u64=18446578331159101456}}) = 0
fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0
read(3, "cpu-thermal\n", 4097)          = 12
read(3, "", 4085)                       = 0
epoll_ctl(4, EPOLL_CTL_DEL, 3, 0x40000a29c0) = 0
close(3)                                = 0
openat(AT_FDCWD, "/sys/class/thermal/thermal_zone0/temp", O_RDONLY|O_CLOEXEC) = 3
fcntl(3, F_GETFL)                       = 0x20000 (flags O_RDONLY|O_LARGEFILE)
fcntl(3, F_SETFL, O_RDONLY|O_NONBLOCK|O_LARGEFILE) = 0
epoll_ctl(4, EPOLL_CTL_ADD, 3, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data={u32=237502481, u64=18446578331159101457}}) = 0
fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0
read(3, "51500\n", 4097)                = 6
read(3, "", 4091)                       = 0
epoll_ctl(4, EPOLL_CTL_DEL, 3, 0x40000a29c0) = 0
close(3)                                = 0
openat(AT_FDCWD, "/sys/class/thermal/thermal_zone1/type", O_RDONLY|O_CLOEXEC) = 3
fcntl(3, F_GETFL)                       = 0x20000 (flags O_RDONLY|O_LARGEFILE)
fcntl(3, F_SETFL, O_RDONLY|O_NONBLOCK|O_LARGEFILE) = 0
epoll_ctl(4, EPOLL_CTL_ADD, 3, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data={u32=237502482, u64=18446578331159101458}}) = 0
fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0
read(3, "gpu-thermal\n", 4097)          = 12
read(3, "", 4085)                       = 0
epoll_ctl(4, EPOLL_CTL_DEL, 3, 0x40000a29c0) = 0
close(3)                                = 0
openat(AT_FDCWD, "/sys/class/thermal/thermal_zone1/temp", O_RDONLY|O_CLOEXEC) = 3
fcntl(3, F_GETFL)                       = 0x20000 (flags O_RDONLY|O_LARGEFILE)
fcntl(3, F_SETFL, O_RDONLY|O_NONBLOCK|O_LARGEFILE) = 0
epoll_ctl(4, EPOLL_CTL_ADD, 3, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data={u32=237502483, u64=18446578331159101459}}) = 0
fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0
read(3, "49906\n", 4097)                = 6
read(3, "", 4091)                       = 0
epoll_ctl(4, EPOLL_CTL_DEL, 3, 0x40000a29c0) = 0
close(3)                                = 0
openat(AT_FDCWD, "/sys/class/thermal/thermal_zone2/type", O_RDONLY|O_CLOEXEC) = 3
fcntl(3, F_GETFL)                       = 0x20000 (flags O_RDONLY|O_LARGEFILE)
fcntl(3, F_SETFL, O_RDONLY|O_NONBLOCK|O_LARGEFILE) = 0
epoll_ctl(4, EPOLL_CTL_ADD, 3, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data={u32=237502484, u64=18446578331159101460}}) = 0
fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0
read(3, "cv0-thermal\n", 4097)          = 12
read(3, "", 4085)                       = 0
epoll_ctl(4, EPOLL_CTL_DEL, 3, 0x40000a29c0) = 0
close(3)                                = 0
openat(AT_FDCWD, "/sys/class/thermal/thermal_zone2/temp", O_RDONLY|O_CLOEXEC) = 3
fcntl(3, F_GETFL)                       = 0x20000 (flags O_RDONLY|O_LARGEFILE)
fcntl(3, F_SETFL, O_RDONLY|O_NONBLOCK|O_LARGEFILE) = 0
epoll_ctl(4, EPOLL_CTL_ADD, 3, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data={u32=237502485, u64=18446578331159101461}}) = 0
fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0
read(3, 0x40000d2000, 4097)             = -1 EAGAIN (Resource temporarily unavailable)
epoll_pwait(4, [{events=EPOLLIN|EPOLLOUT, data={u32=237502485, u64=18446578331159101461}}], 128, 0, NULL, 0) = 1
read(3, 0x40000d2000, 4097)             = -1 EAGAIN (Resource temporarily unavailable)
epoll_pwait(4, [], 128, 0, NULL, 0)     = 0
epoll_pwait(4, 0xffffe6705f40, 128, -1, NULL, 0) = -1 EINTR (Interrupted system call)
--- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = -1 EINTR (Interrupted system call)
epoll_pwait(4, 

@henrygd
Copy link
Owner

henrygd commented Oct 21, 2024

Do those sensors have valid data?

cat /sys/class/thermal/thermal_zone2/type
cat /sys/class/thermal/thermal_zone2/temp

Try checking your open files and process limits:

# check the current nproc limit
ulimit -u
# 124668

# check the current open files limit
ulimit -n
# 2048

If they're very low you can try raising them temporarily:

ulimit -u 4096
ulimit -n 2048

Check /proc/sys/kernel/threads-max:

cat /proc/sys/kernel/threads-max
# 249336

Maybe also try restricting the program to one core:

taskset -c 0 ./temps-linux-arm64

@bogdanr
Copy link
Author

bogdanr commented Oct 21, 2024

thermal_zone2/temp seems to have some problem.

root@jetson:~# cat /sys/class/thermal/thermal_zone2/type
cv0-thermal
root@jetson:~# cat /sys/class/thermal/thermal_zone2/temp
cat: /sys/class/thermal/thermal_zone2/temp: Resource temporarily unavailable

@henrygd
Copy link
Owner

henrygd commented Oct 21, 2024

Try running sudo sensors-detect, maybe that will fix the problem.

If you don't have it, install the lm-sensors package.

@bogdanr
Copy link
Author

bogdanr commented Oct 22, 2024 via email

@henrygd
Copy link
Owner

henrygd commented Oct 22, 2024

It looks like this is an issue specific to Jetson Orin. I found this in their developer guide:

The GPU, CV power rails might be turned off at idle by run time power management. The temperature cannot be read from GPU, CV thermal sensors when the power is off. Attempts to read a sensor with the power off will return error code -EAGAIN (Resource temporarily unavailable).

I think because EAGAIN indicates a temporary status, Go's os.ReadFile waits for the file to become available. Which it never does. NVIDIA probably could've chosen a better error code for this.

I can't figure out a way to replicate the situation on my system. If I could, I would try to debug it and submit a PR to gopsutil. You might want to open an issue over there as they are better devs than I am.

Alternatively, I can add an env var SKIP_TEMPS to the agent that disables reading sensors and skips over that code.

@henrygd
Copy link
Owner

henrygd commented Oct 23, 2024

As of 0.6.2, setting SENSORS to an empty string should bypass the code causing the problem.

You won't get any sensor data, but it should return successfully.

@bogdanr
Copy link
Author

bogdanr commented Oct 24, 2024

It doesn't look like anything changed:

PORT=45876 KEY="ssh-ed25519 _redacted_" LOG_LEVEL=debug SENSORS="" ./beszel-agent
2024/10/24 16:57:58 DEBUG 0.6.2
2024/10/24 16:57:58 DEBUG Not monitoring ZFS ARC err="open /proc/spl/kstat/zfs/arcstats: no such file or directory"
2024/10/24 16:57:58 DEBUG Disk partitions="[{\"device\":\"/dev/nvme0n1p1\",\"mountpoint\":\"/\",\"fstype\":\"ext4\",\"opts\":[\"rw\",\"relatime\"]} {\"device\":\"/dev/nvme0n1p10\",\"mountpoint\":\"/boot/efi\",\"fstype\":\"vfat\",\"opts\":[\"rw\",\"relatime\"]}]"
2024/10/24 16:57:58 DEBUG Disk I/O diskstats="map[loop0:{\"readCount\":176,\"mergedReadCount\":0,\"writeCount\":55,\"mergedWriteCount\":0,\"readBytes\":687616,\"writeBytes\":28672,\"readTime\":76,\"writeTime\":18,\"iopsInProgress\":0,\"ioTime\":96,\"weightedIO\":95,\"name\":\"loop0\",\"serialNumber\":\"\",\"label\":\"\"} nvme0n1:{\"readCount\":108676,\"mergedReadCount\":18078,\"writeCount\":1302525,\"mergedWriteCount\":2510543,\"readBytes\":2844427264,\"writeBytes\":33669846016,\"readTime\":102395,\"writeTime\":37168284,\"iopsInProgress\":0,\"ioTime\":9011676,\"weightedIO\":37426953,\"name\":\"nvme0n1\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p1:{\"readCount\":107816,\"mergedReadCount\":18071,\"writeCount\":1302522,\"mergedWriteCount\":2510543,\"readBytes\":2821657600,\"writeBytes\":33669840896,\"readTime\":101036,\"writeTime\":37168246,\"iopsInProgress\":0,\"ioTime\":9010828,\"weightedIO\":37345035,\"name\":\"nvme0n1p1\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p10:{\"readCount\":158,\"mergedReadCount\":7,\"writeCount\":3,\"mergedWriteCount\":0,\"readBytes\":5562368,\"writeBytes\":5120,\"readTime\":255,\"writeTime\":37,\"iopsInProgress\":0,\"ioTime\":304,\"weightedIO\":304,\"name\":\"nvme0n1p10\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p11:{\"readCount\":62,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":2129920,\"writeBytes\":0,\"readTime\":43,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":40,\"weightedIO\":43,\"name\":\"nvme0n1p11\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p12:{\"readCount\":22,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":90112,\"writeBytes\":0,\"readTime\":28,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":36,\"weightedIO\":28,\"name\":\"nvme0n1p12\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p13:{\"readCount\":62,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":2129920,\"writeBytes\":0,\"readTime\":163,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":144,\"weightedIO\":163,\"name\":\"nvme0n1p13\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p14:{\"readCount\":62,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":2129920,\"writeBytes\":0,\"readTime\":91,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":92,\"weightedIO\":91,\"name\":\"nvme0n1p14\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p15:{\"readCount\":62,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":2129920,\"writeBytes\":0,\"readTime\":108,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":108,\"weightedIO\":108,\"name\":\"nvme0n1p15\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p2:{\"readCount\":62,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":2129920,\"writeBytes\":0,\"readTime\":188,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":172,\"weightedIO\":188,\"name\":\"nvme0n1p2\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p3:{\"readCount\":24,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":98304,\"writeBytes\":0,\"readTime\":7,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":20,\"weightedIO\":7,\"name\":\"nvme0n1p3\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p4:{\"readCount\":48,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":598016,\"writeBytes\":0,\"readTime\":33,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":56,\"weightedIO\":33,\"name\":\"nvme0n1p4\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p5:{\"readCount\":62,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":2129920,\"writeBytes\":0,\"readTime\":77,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":88,\"weightedIO\":77,\"name\":\"nvme0n1p5\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p6:{\"readCount\":24,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":98304,\"writeBytes\":0,\"readTime\":60,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":76,\"weightedIO\":60,\"name\":\"nvme0n1p6\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p7:{\"readCount\":48,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":598016,\"writeBytes\":0,\"readTime\":34,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":44,\"weightedIO\":34,\"name\":\"nvme0n1p7\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p8:{\"readCount\":62,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":2129920,\"writeBytes\":0,\"readTime\":186,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":180,\"weightedIO\":186,\"name\":\"nvme0n1p8\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} nvme0n1p9:{\"readCount\":22,\"mergedReadCount\":0,\"writeCount\":0,\"mergedWriteCount\":0,\"readBytes\":90112,\"writeBytes\":0,\"readTime\":3,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":12,\"weightedIO\":3,\"name\":\"nvme0n1p9\",\"serialNumber\":\"ADATA_LEGEND_710_2O052917E6D2_1\",\"label\":\"\"} zram0:{\"readCount\":386,\"mergedReadCount\":0,\"writeCount\":436,\"mergedWriteCount\":0,\"readBytes\":1581056,\"writeBytes\":1785856,\"readTime\":8,\"writeTime\":4,\"iopsInProgress\":0,\"ioTime\":136,\"weightedIO\":12,\"name\":\"zram0\",\"serialNumber\":\"\",\"label\":\"\"} zram1:{\"readCount\":369,\"mergedReadCount\":0,\"writeCount\":513,\"mergedWriteCount\":0,\"readBytes\":1511424,\"writeBytes\":2101248,\"readTime\":4,\"writeTime\":8,\"iopsInProgress\":0,\"ioTime\":136,\"weightedIO\":12,\"name\":\"zram1\",\"serialNumber\":\"\",\"label\":\"\"} zram2:{\"readCount\":417,\"mergedReadCount\":0,\"writeCount\":513,\"mergedWriteCount\":0,\"readBytes\":1708032,\"writeBytes\":2101248,\"readTime\":4,\"writeTime\":4,\"iopsInProgress\":0,\"ioTime\":96,\"weightedIO\":8,\"name\":\"zram2\",\"serialNumber\":\"\",\"label\":\"\"} zram3:{\"readCount\":449,\"mergedReadCount\":0,\"writeCount\":451,\"mergedWriteCount\":0,\"readBytes\":1839104,\"writeBytes\":1847296,\"readTime\":8,\"writeTime\":0,\"iopsInProgress\":0,\"ioTime\":124,\"weightedIO\":8,\"name\":\"zram3\",\"serialNumber\":\"\",\"label\":\"\"} zram4:{\"readCount\":366,\"mergedReadCount\":0,\"writeCount\":310,\"mergedWriteCount\":0,\"readBytes\":1499136,\"writeBytes\":1269760,\"readTime\":0,\"writeTime\":8,\"iopsInProgress\":0,\"ioTime\":84,\"weightedIO\":8,\"name\":\"zram4\",\"serialNumber\":\"\",\"label\":\"\"} zram5:{\"readCount\":387,\"mergedReadCount\":0,\"writeCount\":449,\"mergedWriteCount\":0,\"readBytes\":1585152,\"writeBytes\":1839104,\"readTime\":0,\"writeTime\":12,\"iopsInProgress\":0,\"ioTime\":116,\"weightedIO\":12,\"name\":\"zram5\",\"serialNumber\":\"\",\"label\":\"\"}]"
2024/10/24 16:57:58 INFO Detected root device name=nvme0n1p1
2024/10/24 16:57:58 INFO Detected network interface name=enP8p1s0 sent=4004873005 recv=73461361728
2024/10/24 16:57:58 DEBUG Docker version=27.3.1 concurrency=5
2024/10/24 16:57:58 DEBUG Getting stats

@henrygd
Copy link
Owner

henrygd commented Oct 24, 2024

You are correct, sorry about that. Fixed for the next release.

@henrygd henrygd added the in progress We've started work on this label Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in progress We've started work on this troubleshooting Maybe bug, maybe not
Projects
None yet
Development

No branches or pull requests

2 participants