-
Notifications
You must be signed in to change notification settings - Fork 23.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reset network_cli command_timeout (alarm) on data recv'd #35817
Conversation
7a1234a
to
bd4447e
Compare
bd4447e
to
2c81522
Compare
@bdowling Is this causing you issues at the moment, ie do you have playbooks that fail at the moment with the default timeouts? |
Yes, a very simple play using ios_config triggers this in part because it calls
Result:
|
With some additional (local) debugging turned on, you can see that the show running all is taking a good amount of time and ansible cuts the connection 10 seconds in a partial receipt of the response:
|
@bdowling your proposed change will break if the prompt is not properly identified and the session will be left in a hung state. It is precisely that situation that the current code path protects. |
@privateip previously this may have been the case. But now with the great work from @ganeshrn and in particular the merged fix in #35439 network_cli no longer sends any commands to try to wake up the prompt (it just uses the last seen). So in this PR, even in cases where the prompt is not found, the timeout will still occur because the device stops sending any output. I hope that you will see from the tests below that I have addressed your concerns. Network delays, device performance, etc can all affect how fast the device can return all the data we are asking of it. Using a timeout that is reset whenever the device outputs data is better than abruptly disconnecting a working, but slow connection short. To prove out this theory, I have tested this in my branch in two ways.
With play:
Which also results in the expected timeout:
|
I have done some manual testing and added integration and unit test as part of PR #37185 to test this patch. Seems to be working fine. |
d675918
to
b5ddd7e
Compare
e755185
to
3f15793
Compare
3f15793
to
b114727
Compare
@ganeshrn - is there more work to go on this PR? (as in does it still need more revisions or perhaps just a rebase?) |
@ganeshrn @ikhan2010 - I don't see evidence of a response from @bdowling in some time. Can we either close this PR or have someone internal take it over, rebase, and merge? Thanks. |
Hi and thank you for your contribution to Ansible. With the release of Ansible 2.10, this code has been updated significantly and moved to https://github.com/ansible-collections/ansible.netcommon If you feel this is still an applicable change, please open a PR in that repository. |
SUMMARY
network_cli currently depends on the command_timeout variable to set the length of time that it waits for a command to FULLY complete. But it does not take into account if the device is actually sending output still.
Why not instead reset this timer when data is received?
This PR proposes that the
alarm()
timer should simply be reset whenever data is received. I did this in a simple way by seeing if the handler forSIGALRM
was set.FYI, with this change, I was able to set
command_timeout=1
in the config and still run a number of long-winded test plays successfully including ones that previously failed with timeout set to 10 (due to theshow running all
that took 23 seconds on my DUT).ISSUE TYPE
COMPONENT NAME
network_cli
ANSIBLE VERSION
ADDITIONAL INFORMATION
Extending COMMAND_TIMEOUT is a oft recommended fix to these timeout issues, but setting it to some large value is impractical because it results in any non-responsive device taking that much extra time to abort the command.