-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Agent doesn't return anything on Jetson Orin Nano (ARM CPU) #196
Comments
Interesting, thanks for providing some details. Does the agent log anything further in debug mode when you attempt to connect to it? Jetson Nano is 64-bit, right? Are you running the agent with the binary or with Docker? If using Docker, run I have an agent running on an ARM Neoverse machine, so it's not a universal ARM issue. |
The v0.5.1 agent now logs stats on startup if using |
I tried with both the binary and the container. I confirm the container is arm64. Running 0.5.1 produces this output:
|
Thanks, it seems to be failing silently while trying to gather the metrics. I'll look further into how that could happen and make sure something is logged. |
I added more debug logs in 0.5.3, so if you run this again we should be able to see more info about where it's breaking down. |
Latest logs are here:
|
I wasn't able to figure out what's going wrong, so I changed the debug logs around in 0.6.0. It probably has something to do with Docker. You can try not mounting the socket, or passing a dummy value like |
I'd say the behavior is different now. Instead of pending the node appears to be down and nothing is listening to port 45876. The logs follow:
And here the same log when not running via docker:
and here some strace snippet:
|
Very strange issue. Doesn't appear to be related to Docker -- it's deadlocking before even getting to that. I can't see how it's happening, but I made a small change and moved the debug logs around again in 0.6.1 to try to narrow down the possibilities. |
Latest logs here from running this binary:
|
I think it's somehow deadlocking in gopsutil's Can you please run this this Go code to see if it locks up as well? If you don't want to build yourself, you can download a linux arm64 binary here: https://assets.henrygd.me/beszel/bin/temps-linux-arm64
package main
import (
"context"
"fmt"
"github.com/shirou/gopsutil/v4/sensors"
)
func main() {
fmt.Println("Before reading temperatures")
temperatures, err := sensors.TemperaturesWithContext(context.Background())
fmt.Println("After reading temperatures")
if err != nil {
err.(*sensors.Warnings).Verbose = true
panic(err)
}
for _, temp := range temperatures {
fmt.Printf("%s: %.1f\n", temp.SensorKey, temp.Temperature)
}
} |
Sure. It looks like it is:
Here it is the last part of strace:
|
Do those sensors have valid data? cat /sys/class/thermal/thermal_zone2/type
cat /sys/class/thermal/thermal_zone2/temp Try checking your open files and process limits: # check the current nproc limit
ulimit -u
# 124668
# check the current open files limit
ulimit -n
# 2048 If they're very low you can try raising them temporarily: ulimit -u 4096
ulimit -n 2048 Check cat /proc/sys/kernel/threads-max
# 249336 Maybe also try restricting the program to one core: taskset -c 0 ./temps-linux-arm64 |
|
Try running If you don't have it, install the |
It doesn't fix it. And I don't think it's important to fix it but more
important would be to have beszel handle this.
I think it would be nice if beszel would skip the data that it can't
acquire in less than a second or so and would print a message letting us
know about that.
…On Mon, Oct 21, 2024 at 9:37 PM hank ***@***.***> wrote:
Try running sudo sensors-detect, maybe that will fix the problem.
If you don't have it, install the lm-sensors package.
—
Reply to this email directly, view it on GitHub
<#196 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAIHUQYSFMBWO7W2MNFWAZ3Z4VCVFAVCNFSM6AAAAABPHVVVCGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRXGQ2DOMZRHA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
It looks like this is an issue specific to Jetson Orin. I found this in their developer guide:
I think because EAGAIN indicates a temporary status, Go's I can't figure out a way to replicate the situation on my system. If I could, I would try to debug it and submit a PR to gopsutil. You might want to open an issue over there as they are better devs than I am. Alternatively, I can add an env var |
As of 0.6.2, setting You won't get any sensor data, but it should return successfully. |
It doesn't look like anything changed:
|
You are correct, sorry about that. Fixed for the next release. |
My agent is stuck on pending on the Jetson Orin Nano.
The issue is not networking. It passes the telnet test and ssh goes past authentication but it doesn't return any statistics.
How I tested. I am running several agents on x86 machines and this one on arm.
I am doing SSH with the key that was generated by teszel, like this:
On the machine that works I can see a message like this and it exits immediately:
On the machine that doesn't work, it looks like this and it never exits:
The agent that doesn't work produces this log when I start it:
The text was updated successfully, but these errors were encountered: