-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NetworkManager-wait-online can fail on slower machines #32
Comments
Current setting can be retrieved via:
And assuming the default is as expected the following will change that specific setting to the proposed 45 seconds:
yast can also configure this setting via:
But as we are akin to a JeOS install a regular Rocsktor system will not have yast configured and is not, as yet, yast compatible. |
An indication of the failed state of NetworkManager-wait-online can be assessed via:
|
I am undecided on the route to take here. Adding many tens of seconds to boot times for what looks to be a non critical service may not be the way to go. Especially give that it seems no Rockstor native service is affected. Also note that on for example the Ten64, if one starts this service post boot the time taken for it to start successfully is around 46 seconds. Whereas during boot, the delay required to achieve successful 'no time out' with the typical samba service enable is 185 seconds. The above increase to 185 seconds (from the default of 30) affects the boot times thus:
Holding off on this change for the time being as this may all be a red-herring of sorts. |
@phillxnet , the same thing happened to me a little while back on some of my Rockstor KVM, but I never could point the source and it was clearly due to my situation at the time... I remember looking around a bit and see some people reporting such timeout at boot when having multiple NICs; this was my best guess at the time as I erroneously was binding a few interfaces to my KVM at the time. I haven't tried those VMs in a while (not sure I still have them), but could the number of interfaces be relevant here? It seems fitting given the high number of interfaces on the Ten64, for instance. |
That's interesting the Ten64 does have 10 NIC's so possible, but I've also seen it on a Haswell NUC, single NIC, and an i5 Ivy Bridge desktop with a single NIC, in the latter 2 cases both machines were fairly heavily loaded starting multiple KVM's thought. This was with generic Leap 15.0/15.1. It's really perplexing, also doesn't look like anything is hanging, just waiting around. I'm inclined to disable actually but not sure of consequences. In the Vault instance I think I removed the dependency on this service at one point as Vault then worked fine anyway in my context here. I think testing on the Pi4 may help shed light as it seems to affect slow / loaded / cpu bound machines. But may just be quirky re hardware as on KVM's here it seems to work immediately. Early timeout settings were 0 I think, wait for ever. This was changed to 1 at some point to stop infinite hangs on the service in some settings. I've moved to 40 - 60 on some settings to make stuff work and finally got to do some testing here in the Rockstor realm. |
I still have my VM that shows that... and it only has one NIC, so the number of NICs seems irrelevant, actually... I'm currently leaning towards IPv6 issue as we still have a lot of log messages with IPv6-related operations failing (understandably so). Maybe we should make sure we're not missing something IPv6-related somewhere. |
On some low power devices, i.e Pi4 / Ten64, and slower/older x86_64 machines, the default Network Manager wait online service leaves insufficient time before 'declaring' to it's dependants that no online state is available. This false negative on online status can lead to dependants, i.e. KVM installs or Hashicorp Vault instances, failing to start as their dependency of online state was not indicated.
The proposed fix is to increase the default wait setting for the NetworkManager-wait-online service.
The service derives it's timeout setting from the following parameter:
in /etc/sysconfig/network/config
some experimentation has indicated that a setting of 45 seconds looks to resolve the observed failures.
The text was updated successfully, but these errors were encountered: