Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for container running before following logs #706

Merged

Conversation

ashish-amarnath
Copy link
Member

@ashish-amarnath ashish-amarnath commented May 5, 2019

What this PR does / why we need it:
Don't attempt to stream logs from containers until they are Running
Which issue(s) this PR fixes

  • Fixes

Special notes for your reviewer:

Release note:

Improves the `sonobuoy logs -f` command by waiting until containers are running to stream logs instead of failing outright and discontinuing logs from that container.

@ashish-amarnath ashish-amarnath force-pushed the wait-container-running branch 2 times, most recently from 2c66c06 to 7f879e9 Compare May 5, 2019 02:24
@codecov-io
Copy link

codecov-io commented May 5, 2019

Codecov Report

Merging #706 into master will decrease coverage by 7.95%.
The diff coverage is 21.73%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #706      +/-   ##
==========================================
- Coverage   47.03%   39.08%   -7.96%     
==========================================
  Files          75       68       -7     
  Lines        4881     3843    -1038     
==========================================
- Hits         2296     1502     -794     
+ Misses       2453     2243     -210     
+ Partials      132       98      -34
Impacted Files Coverage Δ
pkg/client/logs.go 27.86% <21.73%> (-12.3%) ⬇️
pkg/client/status.go 0% <0%> (-55.56%) ⬇️
pkg/plugin/driver/job/job.go 21.51% <0%> (-53.69%) ⬇️
pkg/plugin/driver/daemonset/daemonset.go 15.74% <0%> (-51.14%) ⬇️
pkg/client/interfaces.go 46.15% <0%> (-36.35%) ⬇️
pkg/plugin/aggregation/run.go 0% <0%> (-25.84%) ⬇️
pkg/plugin/driver/base.go 41.33% <0%> (-18.29%) ⬇️
pkg/plugin/aggregation/update.go 36.23% <0%> (-16.3%) ⬇️
pkg/client/e2e.go 74.35% <0%> (-15.39%) ⬇️
... and 45 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e03eebd...ab309e8. Read the comment docs.

@johnSchnake
Copy link
Contributor

Thanks for taking this on!

It seems like either I'm misreading your PR or you've misinterpreted the problem though: it isn't that we tail logs of a given pod and have a problem because it isn't in the proper status yet. The problem is that we grab a static list of pods at the start of the command and then tail just those pods for the rest of the command. So if I'm going to start the aggregator, then 3 plugins but start the logs right after the aggregator, that is the only set of logs I will ever get (I will never start getting the plugin logs because those pods weren't yet created when we got a list of pods). We grab the static list of pods here and then range over it making log streamers for each pod.

We should do something to keep checking for new pods on a somewhat regular basis.

@ashish-amarnath
Copy link
Member Author

ashish-amarnath commented May 5, 2019

@johnSchnake it is definitely the latter. Your comment clarifies the problem. I'll work on the correct fix :)

Correct me if I am not reading the code correctly. Apart from the problem you clarified, I think containers not being ready is also a problem when you are trying to follow logs is also a problem, No?

So let's say when I run sonobuoy logs -f one of the containers is a little slow to start. the stream on its log reader will return an error and I'll not read its logs until I rerun sonobuoy logs -f again.
WDYT?

@ashish-amarnath ashish-amarnath force-pushed the wait-container-running branch from 293e9e7 to 7d433c8 Compare May 6, 2019 03:31
@ashish-amarnath ashish-amarnath changed the title [WIP] check for container running before following logs Check for container running before following logs May 6, 2019
@johnSchnake
Copy link
Contributor

@ashish-amarnath You may be right; I'm not really sure if an error would be returned or if the API calls would handle that more gracefully. One way to test it would be to modify the code to try and hit that condition and just look at the errors you get.

It is a possibility that some of the times when I thought the pod didn't exist that it just wasn't ready and had the same behavior I suppose but I'd prefer to see the issue exist (even though a forced experiment) before trying to fix it.

I can try and make this happen and see if I hit the error.

@johnSchnake
Copy link
Contributor

Confirmed not only that this is an issue, but that it causes a hang in the current code.

I'll put up a PR to fix that hang. The issue is that we are writing an error to an unbuffered channel and no one is reading on the other end so we just block forever when trying to report the error (that the pod is still pending creation).

Once my little fix goes in you should be good to gracefully handle that. 👍 Thanks for bringing that up!

pkg/client/logs.go Outdated Show resolved Hide resolved
pkg/client/logs.go Outdated Show resolved Hide resolved
Copy link
Contributor

@johnSchnake johnSchnake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copying info from slack for context if someone looks at the PR:
This is blocked (IMO) by #708 ; although this PR itself would get around the race condition for the CLI users the deadlock would still occur when using the pkg as a library so I want that to merge first as a fix.

Once that PR gets fixed then I think this is almost ready to go with a few fixups.

@johnSchnake
Copy link
Contributor

Fix for the race has been merged; you can rebase and continue with this. You may want to see if there is a way to leverage those errors you'll now get (if that simplifies any of your logic here)

@johnSchnake
Copy link
Contributor

@ashish-amarnath checking on the status; do you still intend to revisit this?

@ashish-amarnath ashish-amarnath force-pushed the wait-container-running branch 2 times, most recently from a9399e8 to 0529092 Compare September 13, 2019 05:33
pkg/client/logs.go Outdated Show resolved Hide resolved
Copy link
Contributor

@johnSchnake johnSchnake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small change about where/what to print out to the user. 👍

@johnSchnake
Copy link
Contributor

If you can make that change I'll test manually and we can merge today before the release if you have the time.

@johnSchnake
Copy link
Contributor

manual test of sonobuoy run && sonobuoy logs -f showed:

container sonobuoy/sonobuoy/kube-sonobuoy, is not running, will retry streaming logs in 1s seconds
container sonobuoy/sonobuoy/kube-sonobuoy, is not running, will retry streaming logs in 2s seconds
namespace="sonobuoy" pod="sonobuoy" container="kube-sonobuoy"
time="2019-09-13T20:02:02Z" level=info msg="Scanning plugins in ./plugins.d (pwd: /)"
time="2019-09-13T20:02:02Z" level=info msg="Scanning plugins in /etc/sonobuoy/plugins.d (pwd: /)"
time="2019-09-13T20:02:02Z" level=info msg="Directory (/etc/sonobuoy/plugins.d) does not exist"

👍

@johnSchnake johnSchnake merged commit 0f9e963 into vmware-tanzu:master Sep 13, 2019
@ashish-amarnath ashish-amarnath deleted the wait-container-running branch September 13, 2019 20:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants