-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleaner kubectl port-forward
retry logic
#2593
Cleaner kubectl port-forward
retry logic
#2593
Conversation
Codecov Report
|
I am missing unit tests + there is a scenario that is not checked now (not sure if it was covered before): kubectl portforward does not fail if the port is already bound :( |
|
||
if strings.Contains(s, "error forwarding port") || | ||
strings.Contains(s, "unable to forward") || | ||
strings.Contains(s, "error upgrading connection") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to log a warning or something so the user knows the retry is happening and why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is there in trace mode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I keep thinking about what you mentioned: "error upgrading connection" is from port forwarding in client-go: https://github.com/kubernetes/client-go/blob/master/tools/portforward/portforward.go#L194 - I think it is fine to retry on it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure Cloud Code doesn't show trace — it would be good for users to at least be able to discover that there was a failure.
I think restarting on error makes a lot of sense: it's essentially turning kubectl port-forward
into a single-use attempt. Perhaps this code could look for the more general error header (portforward.go:.*error occurred
) and log the error but suppress when one of these strings (known issues)?
|
||
func (k *KubectlForwarder) forward(parentCtx context.Context, pfe *portForwardEntry) { | ||
var notifiedUser bool | ||
for { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when reading this, it looks like this forward
runs forever?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it does, until the pfe gets cancelled (there are 3 return statements in the body, all around cancellation!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got lost in reading and saw only 1 return statement.
I still feel, all the 3 return statements are waiting for conditions that are
- pfe.terminated. - not sure when that happens
- & 3. user enter ^C in
skaffold dev
.
Will it keep trying forever untill above two conditions are satisfied?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, if things are going well, it is going to wait in cmd.Wait()
as long as kubectl portforward is running ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but yes, it does retry unless it is an explicit cancel from the skaffold process...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw this was generated on planttext.com :)
@startuml
title Port Forward State Model
[*] --> CheckForTermination
CheckForTermination -down-> Cancelled
CheckForTermination -down-> TryCmdStart
TryCmdStart -down-> Cancelled
TryCmdStart --> CheckForTermination
TryCmdStart -down-> LogMonitoring
Cancelled -down-> [*]
state "kubectl port-forward logs are monitored" as LogMonitoring {
TryCmdWait -down-> Cancelled
TryCmdWait -up-> CheckForTermination
}
@enduml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish I could put labels on the arrows - but the key thing is non-cancellation error scenarios go back to start, cancellation exists, no error continues to next step.
port-forward
retry logic
…nstead of assuming local port 50053
Now that we cleanup "skaffold dev", the TestEvent was failing. The first run of this test `TestEvent/v1/event_log` was building images, and at the end of the dev loop prune was cleaning up all of the containers. In the second test `TestEvent/v1/events`, the images were being built but upon push failed on this error: ``` The push refers to repository [gcr.io/k8s-skaffold/test-dev] time="2019-08-01T18:32:12-07:00" level=fatal msg="build failed: building [gcr.io/k8s-skaffold/test-dev]: build artifact: tag does not exist: gcr.io/k8s-skaffold/test-dev:v0.34.1-64-g1d5d3b68-dirty" ``` I think this is because docker was pointing to layers it thought it had already built, but actually didn't exist anymore. This test is fixed by adding `--no-prune=true`.
cc @balopat looks like there was a prune error now that we are cleaning up on I think the issue is that we first run I added |
…at rely on the same images
f3cc147
to
f2defe8
Compare
Merging this to release branch. |
This clears up the port forwarding logic on the kubectlForwarder level.
I got rid of the two polling wait loops, instead there is one single loop that
error upgrading connection
kubectl port-forward
in case the process exits on failure (or gets killed by the log monitoring go routine)Also, added logic to handle port collision right before kicking off the port forwarding process. If that happens, no new port is being brokered, in the hope that the user will kill the (assumed) external process. In case the colliding process dies, the port forwarding will resume (with a nice recovery message to the user).
Also, we found a bug in the
portForwarder
mapLoadOrStore
function was not storing anything. This is fixed now with a copy paste of the unit test that was testing the port brokering logic onsync.Map
. This copy paste should go away when we implement #2503.Fixed the retry logic for theTestDevPortForwardGKELoadBalancer
integration test, plus fixed the lingering portforwarding processes by sending SIGTERM to theskaffold dev
process instead of just killing it.