-
Notifications
You must be signed in to change notification settings - Fork 345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check for container running before following logs #706
Check for container running before following logs #706
Conversation
2c66c06
to
7f879e9
Compare
Codecov Report
@@ Coverage Diff @@
## master #706 +/- ##
==========================================
- Coverage 47.03% 39.08% -7.96%
==========================================
Files 75 68 -7
Lines 4881 3843 -1038
==========================================
- Hits 2296 1502 -794
+ Misses 2453 2243 -210
+ Partials 132 98 -34
Continue to review full report at Codecov.
|
Thanks for taking this on! It seems like either I'm misreading your PR or you've misinterpreted the problem though: it isn't that we tail logs of a given pod and have a problem because it isn't in the proper status yet. The problem is that we grab a static list of pods at the start of the command and then tail just those pods for the rest of the command. So if I'm going to start the aggregator, then 3 plugins but start the logs right after the aggregator, that is the only set of logs I will ever get (I will never start getting the plugin logs because those pods weren't yet created when we got a list of pods). We grab the static list of pods here and then range over it making log streamers for each pod. We should do something to keep checking for new pods on a somewhat regular basis. |
@johnSchnake it is definitely the latter. Your comment clarifies the problem. I'll work on the correct fix :) Correct me if I am not reading the code correctly. Apart from the problem you clarified, I think containers not being ready is also a problem when you are trying to follow logs is also a problem, No? So let's say when I run |
293e9e7
to
7d433c8
Compare
@ashish-amarnath You may be right; I'm not really sure if an error would be returned or if the API calls would handle that more gracefully. One way to test it would be to modify the code to try and hit that condition and just look at the errors you get. It is a possibility that some of the times when I thought the pod didn't exist that it just wasn't ready and had the same behavior I suppose but I'd prefer to see the issue exist (even though a forced experiment) before trying to fix it. I can try and make this happen and see if I hit the error. |
Confirmed not only that this is an issue, but that it causes a hang in the current code. I'll put up a PR to fix that hang. The issue is that we are writing an error to an unbuffered channel and no one is reading on the other end so we just block forever when trying to report the error (that the pod is still pending creation). Once my little fix goes in you should be good to gracefully handle that. 👍 Thanks for bringing that up! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copying info from slack for context if someone looks at the PR:
This is blocked (IMO) by #708 ; although this PR itself would get around the race condition for the CLI users the deadlock would still occur when using the pkg as a library so I want that to merge first as a fix.
Once that PR gets fixed then I think this is almost ready to go with a few fixups.
Fix for the race has been merged; you can rebase and continue with this. You may want to see if there is a way to leverage those errors you'll now get (if that simplifies any of your logic here) |
@ashish-amarnath checking on the status; do you still intend to revisit this? |
a9399e8
to
0529092
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small change about where/what to print out to the user. 👍
If you can make that change I'll test manually and we can merge today before the release if you have the time. |
0529092
to
2f198c1
Compare
Signed-off-by: Ashish Amarnath <[email protected]>
2f198c1
to
ab309e8
Compare
manual test of
👍 |
What this PR does / why we need it:
Don't attempt to stream logs from containers until they are
Running
Which issue(s) this PR fixes
Special notes for your reviewer:
Release note: