Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v0.25.2 regression] flannel.alpha.coreos.com/public-ip-overwrite fails with error looking up interface XXX.XXX.XXX.XXX: No interface with given IP found after restarting a node #1978

Closed
AkihiroSuda opened this issue May 27, 2024 · 8 comments · Fixed by #1982

Comments

@AkihiroSuda
Copy link
Contributor

AkihiroSuda commented May 27, 2024

Expected Behavior

flannel.alpha.coreos.com/public-ip-overwrite should continue to work after restarting a node

Current Behavior

Flannel v0.25.2 fails with error looking up interface XXX.XXX.XXX.XXX: No interface with given IP found after restarting a node, even when flannel.alpha.coreos.com/public-ip-overwrite is specified to allow XXX.XXX.XXX.XXX.

It was working in Flannel v0.25.1.

Possible Solution

Revert:

Steps to Reproduce (for bugs)

  1. Checkout https://github.com/rootless-containers/usernetes/tree/gen2-v20240527.0
  2. Apply the following update:
diff --git a/Makefile b/Makefile
index 0e04f0e..a9c6e19 100644
--- a/Makefile
+++ b/Makefile
@@ -150,4 +150,4 @@ kubeadm-reset:
 
 .PHONY: install-flannel
 install-flannel:
-       $(NODE_SHELL) kubectl apply -f https://github.com/flannel-io/flannel/releases/download/v0.25.1/kube-flannel.yml
+       $(NODE_SHELL) kubectl apply -f https://github.com/flannel-io/flannel/releases/download/v0.25.2/kube-flannel.yml
  1. Initialize a node. This step should be executed with Rootless Docker, but Rootful Docker is fine too.
$ make up kubeadm-init install-flannel kubeconfig
$ export KUBECONFIG=$(pwd)/kubeconfig
$ kubectl get pods -A
  1. Restart the node, and see that the kube-flannel container fails:
$ make down
$ make up
$ kubectl logs -n kube-flannel kube-flannel-ds-hrqt4 
Defaulted container "kube-flannel" out of: kube-flannel, install-cni-plugin (init), install-cni (init)
I0527 02:14:04.874324       1 main.go:211] CLI flags config: {etcdEndpoints:http://127.0.0.1:4001,http://127.0.0.1:2379 etcdPrefix:/coreos.com/network etcdKeyfile: etcdCertfile: etcdCAFile: etcdUsername: etcdPassword: version:false kubeSubnetMgr:true kubeApiUrl: kubeAnnotationPrefix:flannel.alpha.coreos.com kubeConfigFile: iface:[] ifaceRegex:[] ipMasq:true ifaceCanReach: subnetFile:/run/flannel/subnet.env publicIP: publicIPv6: subnetLeaseRenewMargin:60 healthzIP:0.0.0.0 healthzPort:0 iptablesResyncSeconds:5 iptablesForwardRules:true netConfPath:/etc/kube-flannel/net-conf.json setNodeNetworkUnavailable:true}
W0527 02:14:04.874476       1 client_config.go:618] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0527 02:14:04.897430       1 kube.go:139] Waiting 10m0s for node controller to sync
I0527 02:14:04.897737       1 kube.go:455] Starting kube subnet manager
I0527 02:14:04.902721       1 kube.go:476] Creating the node lease for IPv4. This is the n.Spec.PodCIDRs: [10.244.0.0/24]
I0527 02:14:05.899116       1 kube.go:146] Node controller sync successful
I0527 02:14:05.899197       1 main.go:231] Created subnet manager: Kubernetes Subnet Manager - u7s-suda-ws01
I0527 02:14:05.899211       1 main.go:234] Installing signal handlers
I0527 02:14:05.900566       1 main.go:452] Found network config - Backend type: vxlan
I0527 02:14:05.908250       1 kube.go:655] List of node(u7s-suda-ws01) annotations: map[string]string{"flannel.alpha.coreos.com/backend-data":"{\"VNI\":1,\"VtepMAC\":\"b2:ad:96:1b:27:b2\"}", "flannel.alpha.coreos.com/backend-type":"vxlan", "flannel.alpha.coreos.com/kube-subnet-manager":"true", "flannel.alpha.coreos.com/public-ip":"192.168.60.11", "flannel.alpha.coreos.com/public-ip-overwrite":"192.168.60.11", "kubeadm.alpha.kubernetes.io/cri-socket":"unix:///var/run/containerd/containerd.sock", "node.alpha.kubernetes.io/ttl":"0", "volumes.kubernetes.io/controller-managed-attach-detach":"true"}
I0527 02:14:05.908335       1 match.go:74] Searching for interface using 192.168.60.11
E0527 02:14:05.908814       1 main.go:287] Failed to find any valid interface to use: error looking up interface 192.168.60.11: No interface with given IP found

Context

Regression in v0.25.2

Your Environment

@AkihiroSuda
Copy link
Contributor Author

@tanvp112
Copy link

tanvp112 commented May 29, 2024

Confirm seeing this issue when using v0.25.2 on KIND cluster. Reverted to v0.25.1 for the time being.

@rbrtbnfgl
Copy link
Contributor

I left this open. Changed how the parameter is passed it'll be fixed for the next release.

@tanvp112
Copy link

tanvp112 commented May 30, 2024

@rbrtbnfgl, @AkihiroSuda,

Can I ask the definition of "flannel.alpha.coreos.com/node-public-ip":

  • Is this referring to the Internet addressable IP? What is the expected value here.
  • For node that has only 1 network interface (eg. KIND, MINIKUBE, only eth0 and lo) and sit in private network, why is flannel not using the network interface like before after restart?
  • Does this means to use Flannel CNI, we need to pre-allocate static IP for the node? This is not going to work in most environment. Is it possible to specify network interface name?
  • "host-gw" is also affected by this regression. Does this annotation affect this backend too?

Thanks.

@rbrtbnfgl
Copy link
Contributor

It could be the addressable IP but generally it should use in case of a node with multiple interface and helps to select Flannel the right one. This issue was introduced because it was using the public-ip annotation and in case of public-ip-overwrite it was changed with the addressable IP that could be not a specific IP of the node and that's why it was failing because there wasn't any interface with that IP. I moved the logic within a new annotation so the public-ip is behaves as before and publuc-ip-overwrite can work as expected. In case you don't have to define an interface that flannel should use with a node specific IP you can omit the new annotation and Flannel will choose the interface as before.

@tanvp112
Copy link

@rbrtbnfgl , thanks for the reply. In the case of KIND there's only one interface (eth0), but v0.25.2 has also failed with "No interface with given IP found after restarting a node" when the host restarted. Note this doesn't happen to v0.25.1. I suspect that could be something else because this shouldn't happen base on your replied,

@rbrtbnfgl
Copy link
Contributor

This is how it should work with the fix. With the current release is not working as I explained. I think we'll release a new version with the right behavior.

@AkihiroSuda
Copy link
Contributor Author

Thanks, v0.25.3 works fine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants