Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to coreDNS 1.5 in node-cache #306

Closed
prameshj opened this issue May 30, 2019 · 21 comments · Fixed by #328
Closed

Upgrade to coreDNS 1.5 in node-cache #306

prameshj opened this issue May 30, 2019 · 21 comments · Fixed by #328

Comments

@prameshj
Copy link
Contributor

This is needed to pick up a bugfix coredns/coredns#2636

@woopstar
Copy link
Member

@prameshj can you test if our new image has the same problems with the cleanup that caused the revert:

pasientskyhosting/k8s-dns-node-cache-amd64:1.15.4-2-g903339e-coredns1.6.2-dirty

@prameshj
Copy link
Contributor Author

prameshj commented Sep 2, 2019

@prameshj can you test if our new image has the same problems with the cleanup that caused the revert:

pasientskyhosting/k8s-dns-node-cache-amd64:1.15.4-2-g903339e-coredns1.6.2-dirty

It still does not do the cleanup.. This is how i test it:

  1. Pick one of the node-local-dns pods and run "watch -n 1 'iptables -t raw -nvL'"
    This command watches the iptables rules that node-local-dns pod configures.
  2. Delete that specific node-local-dns pod with "kubectl delete pods "
  3. See the output of 1) if the rules are still there.

I see the rules are still there and get cleaned up when the next pod comes up. This is problematic when we disable nodelocaldns and the rules/interface are not cleaned up.

Last I had checked, like the shutdown callbacks in forward plugin are taking longer/not completing before pod is shut down. The node-cache cleanup would be called after all the shutdown callbacks of plugins. @johnbelamaric any ideas on how to debug this further?

@jordips
Copy link

jordips commented Sep 4, 2019

Ok. I test it with the image and PREROUTING rules are not deleted, as you said

Chain PREROUTING (policy ACCEPT 4239 packets, 952K bytes)
    4   374 CT         udp  --  *      *       0.0.0.0/0            169.254.25.10        udp dpt:53 NOTRACK
    0     0 CT         tcp  --  *      *       0.0.0.0/0            169.254.25.10        tcp dpt:53 NOTRACK

I will try to debug a litt bit more.

@jordips
Copy link

jordips commented Sep 4, 2019

Last I had checked, like the shutdown callbacks in forward plugin are taking longer/not completing before pod is shutdown

@prameshj How did you debug this part? I'm trying to see logs when pod is deleted but I can't access with --previous flag.

Thank you

@prameshj
Copy link
Contributor Author

prameshj commented Sep 5, 2019

I think i added logs to the different plugin shutdown callbacks and checked if they are invoked.
I started with logs here -

exitCode := executeShutdownCallbacks("SIGTERM")

then checked which callback completed, i recall forward plugin callback did not complete.

I also modified nodelocaldns yaml to include a terminationGracePeriodSeconds value of 60, but that did not help.

@prameshj
Copy link
Contributor Author

prameshj commented Sep 6, 2019

@jordips Is it possible for you to build this image with the latest CoreDNS - 1.6.3? https://github.com/coredns/coredns/releases

I talked to @johnbelamaric , he suggested we try with the latest version and if it still has the same behavior, we can open an issue on coredns repo.

@jordips
Copy link

jordips commented Sep 9, 2019

@prameshj sure! I will try later and I tell you something.

@jordips
Copy link

jordips commented Sep 9, 2019

@woopstar or @prameshj Is there any special procedure to manage packages with DEP? I change coreDNS version in Gopkg.toml and making a dep ensure it fails with missing packages from k8s.io (the ones inside "pkg" and "cmd")
Thanks

@woopstar
Copy link
Member

woopstar commented Sep 9, 2019

@chad-jones Is the one who made our image and had it work with 1.6.2. I can ask him to rebuild with 1.6.3 too

@jordips
Copy link

jordips commented Sep 9, 2019

@woopstar that will be awesome! :) Or if @chad-jones explains the rebuild steps I can do it myself. I think the procedure is:

  • change Gopkg.toml (coredns version)
  • "dep ensure" or something like this to get new source to vendor folder.
  • "make containers" to build new image.

But I have problems with dep dependecies... I'm still solving this

@woopstar
Copy link
Member

woopstar commented Sep 9, 2019

@chad-jones mentioned for me that it was a dep nightmare when he did 1.6.2 :) I'll let him post here whatever he decides to do :)

@jordips
Copy link

jordips commented Sep 9, 2019

@woopstar perfect. Thank you for the info. Good to know that i'm not the only one thinking is a nightmare 👍 :D

@chad-jones
Copy link

CoreDNS - 1.6.3
pasientskyhosting/k8s-dns-node-cache-amd64:1.15.4-coredns1.6.3-dirty

@johnbelamaric
Copy link
Member

k/k and CoreDNS have both moved to go modules, is it worth making that change here.

@prameshj
Copy link
Contributor Author

Andreas Krüger perfect. Thank you for the info. Good to know that i'm not the only one thinking is a nightmare 👍 :D

I think the biggest issue is with the prometheus client version that skydns uses and what CoreDNS requires. When i tried updating to a newer CoreDNS, i had to manually edit the skydns vendor directory to use a newer prometheus client. Was this the issue you faced too?

I am not sure if moving to go modules will fix that particular issue.. but it is worth making the change.

@prameshj
Copy link
Contributor Author

I will try out the cleanup test with the new image and report back in a day or so. Thanks for sharing the image.

@axot
Copy link

axot commented Sep 18, 2019

I also tried to upgrade to 1.5.3 before and did not face the skydns issue. One more change I think was caddy renamed from github.com/mholt/caddy to github.com/caddyserver/caddy.

@prameshj
Copy link
Contributor Author

pasientskyhosting/k8s-dns-node-cache-amd64:1.15.4-coredns1.6.3-dirty

I just tried this.. The cleanup code is not invoked in the latest version either.

@prameshj
Copy link
Contributor Author

prameshj commented Oct 4, 2019

@woopstar @chad-jones I am trying to debug the cleanup issue by adding logs and building a custom image. I am having a hard time getting the image to build.

I changed Gopkg.toml to point to coredns 1.16.4 and client_golang to 1.1.0 which coredns needs. However, that dependency is not working when i do "dep ensure". Could you submit a PR with the changes to upgrade to 1.16? I can debug it further. Thanks!

@prameshj
Copy link
Contributor Author

I found the issue. coreDNS changed the import path for caddy in coredns/coredns#2961
node-cache was still using the old import path and setting the cleanup function as part of OnProcessExit. I changed it to use the right path and it is working fine now.
many thanks to @chad-jones for creating the PR with the dependencies update. I have submitted a PR with the changes, hope to merge it soon.

@woopstar
Copy link
Member

woopstar commented Dec 3, 2019

We are glad to help out with this issue! Thank you @prameshj for helping out on an official solution to this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants