You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What steps did you take and what happened:
Tests were run in a situation with poor network behavior and the logs have lots of "connection reset by peer" lines.
The worker logic uses a client that uses retries by default
The aggregator will accept the first request and then blacklist any more attempts because it has "seen" those results. So what is happening is part of the data is being sent, the connection gets reset, the client tries again but gets turned away.
What did you expect to happen:
The server should accept new results in cases where errors were encountered.
Anything else you would like to add:
The logic for marking the results as "seen" even if we get errors makes sense (as the code comments indicate) because we don't want the server to disregard error'd results as it may make the server hang if the client doesn't retry again.
However, it seems like the logic needs to be tweaked so that we either:
always allow the resending of data and take the newest values
track which results we got with/without error and allow retries on the former
Environment:
Sonobuoy version: v0.14.2
The text was updated successfully, but these errors were encountered:
Thanks to @rbankston for reporting this bug and providing the necessary logs to diagnose it. (kubernetes/kubernetes#74839 may have caused the original connection reset issue but it is not apparently a final, perfect fix and may run into other issues)
Added this to the current milestone as p0. We have the bandwidth to support and I consider this a pretty significant bug since (1) it is confusing to users (2) it is not easy to debug if you are not very familiar with the system (3) it may be blocking to users: if this happens regularly for some reason it can keep users from ever being able to get results from their plugins.
What steps did you take and what happened:
Tests were run in a situation with poor network behavior and the logs have lots of "connection reset by peer" lines.
The worker logic uses a client that uses retries by default
The aggregator will accept the first request and then blacklist any more attempts because it has "seen" those results. So what is happening is part of the data is being sent, the connection gets reset, the client tries again but gets turned away.
What did you expect to happen:
The server should accept new results in cases where errors were encountered.
Anything else you would like to add:
The logic for marking the results as "seen" even if we get errors makes sense (as the code comments indicate) because we don't want the server to disregard error'd results as it may make the server hang if the client doesn't retry again.
However, it seems like the logic needs to be tweaked so that we either:
Environment:
The text was updated successfully, but these errors were encountered: