Cleaner kubectl `port-forward` retry logic #2593

balopat · 2019-08-01T01:43:43Z

This clears up the port forwarding logic on the kubectlForwarder level.
I got rid of the two polling wait loops, instead there is one single loop that

runs until the port forward entry gets cancelled by skaffold itself.
it monitors the kubectl port-forward logs in one separate go routine and kills the process in case of known errors - new known error introduced: error upgrading connection
It reruns kubectl port-forward in case the process exits on failure (or gets killed by the log monitoring go routine)
pod-running-timeout is now more aggressive, 1 second for more responsive UX

Also, added logic to handle port collision right before kicking off the port forwarding process. If that happens, no new port is being brokered, in the hope that the user will kill the (assumed) external process. In case the colliding process dies, the port forwarding will resume (with a nice recovery message to the user).

Also, we found a bug in the portForwarder map LoadOrStore function was not storing anything. This is fixed now with a copy paste of the unit test that was testing the port brokering logic on sync.Map. This copy paste should go away when we implement #2503.

Fixed the retry logic for the TestDevPortForwardGKELoadBalancer integration test, plus fixed the lingering portforwarding processes by sending SIGTERM to the skaffold dev process instead of just killing it.

codecov · 2019-08-01T04:26:16Z

Codecov Report

Merging #2593 into release/v0.35.0 will decrease coverage by 0.05%.
The diff coverage is 32.07%.

Impacted Files	Coverage Δ
...ffold/kubernetes/portforward/port_forward_entry.go	`100% <ø> (ø)`	⬆️
...kubernetes/portforward/port_forward_integration.go	`0% <0%> (ø)`	⬆️
...affold/kubernetes/portforward/kubectl_forwarder.go	`0% <0%> (ø)`	⬆️
pkg/skaffold/util/port.go	`86.66% <100%> (+0.95%)`	⬆️
...g/skaffold/kubernetes/portforward/pod_forwarder.go	`86.95% <100%> (ø)`	⬆️
pkg/skaffold/runner/portforwarder.go	`100% <100%> (ø)`	⬆️
...ffold/kubernetes/portforward/resource_forwarder.go	`56% <100%> (+4.97%)`	⬆️
...g/skaffold/kubernetes/portforward/entry_manager.go	`93.47% <94.11%> (+6.24%)`	⬆️

balopat · 2019-08-01T06:05:18Z

I am missing unit tests + there is a scenario that is not checked now (not sure if it was covered before): kubectl portforward does not fail if the port is already bound :(
I managed to create a little adversarial scenario where a competing process takes the port and kubectl port forward just runs in the background happily not throwing any errors.

briandealwis · 2019-08-01T18:58:32Z

pkg/skaffold/kubernetes/portforward/kubectl_forwarder.go

+
+				if strings.Contains(s, "error forwarding port") ||
+					strings.Contains(s, "unable to forward") ||
+					strings.Contains(s, "error upgrading connection") {


Would it make sense to log a warning or something so the user knows the retry is happening and why?

it is there in trace mode

I keep thinking about what you mentioned: "error upgrading connection" is from port forwarding in client-go: https://github.com/kubernetes/client-go/blob/master/tools/portforward/portforward.go#L194 - I think it is fine to retry on it...

I'm pretty sure Cloud Code doesn't show trace — it would be good for users to at least be able to discover that there was a failure.

I think restarting on error makes a lot of sense: it's essentially turning kubectl port-forward into a single-use attempt. Perhaps this code could look for the more general error header (portforward.go:.*error occurred) and log the error but suppress when one of these strings (known issues)?

…ancellation

tejal29 · 2019-08-01T20:00:01Z

pkg/skaffold/kubernetes/portforward/kubectl_forwarder.go

+
+func (k *KubectlForwarder) forward(parentCtx context.Context, pfe *portForwardEntry) {
+	var notifiedUser bool
+	for {


when reading this, it looks like this forward runs forever?

it does, until the pfe gets cancelled (there are 3 return statements in the body, all around cancellation!)

got lost in reading and saw only 1 return statement.
I still feel, all the 3 return statements are waiting for conditions that are

pfe.terminated. - not sure when that happens

& 3. user enter ^C in skaffold dev .

Will it keep trying forever untill above two conditions are satisfied?

well, if things are going well, it is going to wait in cmd.Wait() as long as kubectl portforward is running ...

but yes, it does retry unless it is an explicit cancel from the skaffold process...

Something like this:

btw this was generated on planttext.com :)

@startuml title Port Forward State Model [*] --> CheckForTermination CheckForTermination -down-> Cancelled CheckForTermination -down-> TryCmdStart TryCmdStart -down-> Cancelled TryCmdStart --> CheckForTermination TryCmdStart -down-> LogMonitoring Cancelled -down-> [*] state "kubectl port-forward logs are monitored" as LogMonitoring { TryCmdWait -down-> Cancelled TryCmdWait -up-> CheckForTermination } @enduml

I wish I could put labels on the arrows - but the key thing is non-cancellation error scenarios go back to start, cancellation exists, no error continues to next step.

pkg/skaffold/kubernetes/portforward/kubectl_forwarder.go

… log

…nstead of assuming local port 50053

Now that we cleanup "skaffold dev", the TestEvent was failing. The first run of this test `TestEvent/v1/event_log` was building images, and at the end of the dev loop prune was cleaning up all of the containers. In the second test `TestEvent/v1/events`, the images were being built but upon push failed on this error: ``` The push refers to repository [gcr.io/k8s-skaffold/test-dev] time="2019-08-01T18:32:12-07:00" level=fatal msg="build failed: building [gcr.io/k8s-skaffold/test-dev]: build artifact: tag does not exist: gcr.io/k8s-skaffold/test-dev:v0.34.1-64-g1d5d3b68-dirty" ``` I think this is because docker was pointing to layers it thought it had already built, but actually didn't exist anymore. This test is fixed by adding `--no-prune=true`.

priyawadhwa · 2019-08-02T01:44:41Z

cc @balopat looks like there was a prune error now that we are cleaning up on TestEvent -- logs here.

I think the issue is that we first run TestEvent/v1/event_log which builds image with digest 77ae371cff1b. This is then cleaned up by prune. We then try to run TestEvent/v1/events, which builds the same image 77ae371cff1b, claims it's in the cache, but then actually fails to push the image because it doesn't exist.

I added --no-prune=true to get around this.

…at rely on the same images

balopat · 2019-08-02T15:50:59Z

Merging this to release branch.

experimenting with cleaner port forward retry logic

60b1bde

googlebot added the cla: yes label Aug 1, 2019

balopat added 2 commits July 31, 2019 21:06

further simplifications

861ccf4

remove example changes

869eef0

balopat added 3 commits July 31, 2019 21:35

remove unnecessary test

8935ad7

fix integration test retries

0b3c285

cleanups

6bb2078

balopat added 3 commits July 31, 2019 23:08

comment, error propagation cleanup

60f4c15

retry on taken port

cfdcb3c

retry on taken port

fde0886

briandealwis reviewed Aug 1, 2019

View reviewed changes

fix a subtle bug in port forwarding registry + fix bound port retry c…

1aebb92

…ancellation

tejal29 reviewed Aug 1, 2019

View reviewed changes

fix cleanup for integraiton tests

392fc97

This was referenced Aug 1, 2019

Stabilize port forwarding #2586

Closed

Add port forwarding unit & integration tests #2596

Closed

fix tests

1543e10

balopat changed the title ~~experimenting with cleaner port forward retry logic~~ Cleaner port forward retry logic Aug 1, 2019

balopat marked this pull request as ready for review August 1, 2019 20:38

balopat requested review from dgageot, nkubala and priyawadhwa as code owners August 1, 2019 20:38

priyawadhwa mentioned this pull request Aug 1, 2019

Refactor GetAvailablePort to PortBroker #2503

Closed

balopat changed the title ~~Cleaner port forward retry logic~~ Cleaner kubectl port-forward retry logic Aug 1, 2019

balopat added 2 commits August 1, 2019 14:05

fixed a deadlock issue

7a990b5

fix lint

76a8616

tejal29 reviewed Aug 1, 2019

View reviewed changes

pkg/skaffold/kubernetes/portforward/kubectl_forwarder.go Show resolved Hide resolved

quicker integration test failure + added namespace to port forwarding…

c145f2b

… log

balopat and others added 3 commits August 1, 2019 15:41

clarify the role of cancel()

becb9c5

Update TestDevPortForward to get local port from port forward event i…

1d5d3b6

…nstead of assuming local port 50053

balopat requested a review from loosebazooka as a code owner August 2, 2019 04:38

Add no-prune to debug test as well, since there are multiple tests th…

f2defe8

…at rely on the same images

priyawadhwa force-pushed the cleaner_port_forward_retry branch from f3cc147 to f2defe8 Compare August 2, 2019 04:39

revert helper.go changes

0e69e4c

balopat changed the base branch from master to release/v0.35.0 August 2, 2019 15:50

balopat merged commit 4b959ca into GoogleContainerTools:release/v0.35.0 Aug 2, 2019

balopat mentioned this pull request Aug 2, 2019

Cleaner port forward retry #2601

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cleaner kubectl `port-forward` retry logic #2593

Cleaner kubectl `port-forward` retry logic #2593

balopat commented Aug 1, 2019 •

edited

Loading

codecov bot commented Aug 1, 2019 •

edited

Loading

balopat commented Aug 1, 2019

briandealwis Aug 1, 2019

balopat Aug 1, 2019

balopat Aug 1, 2019

briandealwis Aug 2, 2019

tejal29 Aug 1, 2019

balopat Aug 1, 2019

tejal29 Aug 1, 2019

balopat Aug 1, 2019

balopat Aug 1, 2019

balopat Aug 1, 2019

balopat Aug 1, 2019

balopat Aug 1, 2019

priyawadhwa commented Aug 2, 2019

balopat commented Aug 2, 2019

Cleaner kubectl port-forward retry logic #2593

Cleaner kubectl port-forward retry logic #2593

Conversation

balopat commented Aug 1, 2019 • edited Loading

codecov bot commented Aug 1, 2019 • edited Loading

Codecov Report

balopat commented Aug 1, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

priyawadhwa commented Aug 2, 2019

balopat commented Aug 2, 2019

Cleaner kubectl `port-forward` retry logic #2593

Cleaner kubectl `port-forward` retry logic #2593

balopat commented Aug 1, 2019 •

edited

Loading

codecov bot commented Aug 1, 2019 •

edited

Loading