-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flatcar 3975.2.1 Bonding Config Bug #1580
Comments
We were asked for the following:
We were asked to try the following but are still seeing issues:
|
Can you try the alpha releases between
|
Please see below upgrade process and results:
Please note the following is the first presence of
|
Hello, this looks to be a concurrency issue between the unit that enforces/creates the bond and the unit that enforces the Also, is it possible to maybe try a version of Flatcar with a different kernel / systemd to see if the issue does still happen (you can find a Flatcar image artifact here with kernel 6.11 https://github.com/flatcar/scripts/actions/runs/11594744048 and one Flatcar image artifact here with systemd 256 https://github.com/flatcar/scripts/actions/runs/11557455799). My bet would be on a different systemd version. Thanks. |
Can we also compare |
Please see the information you requested below: 3975.2.1
3760.2.0
|
First Question
Second Question
|
Description
We've encountered a problem with bonding configs., after our most recent Flatcar upgrade from v3760.2.0 to v3975.2.1. The behavior is very weird, in that the bond0 interface actor churn does not always begin after the initial upgrade reboot. Instead, the bond0 interface actor churn most frequently appears after a subsequent reboot.
Impact
Nodes, rebooted after the initial upgrade reboot, go into churn on the secondary Bond0 interface and are subsequently unable to communicate with other nodes in the cluster.
Environment and steps to reproduce
a. Baremetal Flatcar OS 3760.2.0 upgraded via Nebraska to Flatcar OS 3975.2.1
a. After the node is upgraded and rebooted, the node is then rebooted a second time, and churn appears, which causes lag during node login and commands being run
a. Rebooted the node, after the initial upgrade reboot
b. Node login and commands begin to hang and take many seconds to minutes to complete
c. /proc/net/bonding/bond0 shows churn on the secondary interface, and has no system mac address present
a. Nodes were unable to communicate with effected node
Expected behavior
Expect nodes to communicate with other nodes in the cluster
Additional information
Please add any information here that does not fit the above format.
The text was updated successfully, but these errors were encountered: