Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use systemd-networkd for aws-ecs-2 and k8s 1.28 variants #3394

Merged
merged 3 commits into from
Sep 11, 2023

Conversation

zmrow
Copy link
Contributor

@zmrow zmrow commented Aug 31, 2023

Issue number:
Related to #2449

Description of changes:
Draft while testing is in progress

This PR moves the aws-ecs-2, Kubernetes 1.28, and *-dev variants to systemd-networkd as the network backend.

Testing done:
WIP - will update as testing is completed

  • Build each of the variants
  • aws-k8s-1.28 conformance testing in IPv6-only cluster
{
"plugins": [
{
"plugin": "e2e",
"node": "global",
"status": "complete",
"result-status": "passed",
"result-counts": {
"passed": 384,
"skipped": 7007
},
  • aws-k8s-1.28 conformance testing in IPv4 cluster
 e2e   global   complete   passed   Passed:380, Failed:  0, Remaining:  0
  • metal-k8s-1.28 conformance testing
e2e   complete   passed       1   Passed:380, Failed:  0, Remaining:  0
  • vmware-k8s-1.28 conformance testing
e2e   global   complete   passed   Passed:380, Failed:  0, Remaining:  0
  • aws-ecs-2 internal ECS testing
  • Given custom (correct) DNS settings, instance continues to function properly in a cluster and proper interface/resolved configuration is set (set custom settings identical to what comes in via DHCP, observe the settings as Global via resolvectl.)
bash-5.1# cat /etc/systemd/resolved.conf.d/10-resolv.conf 
[Resolve]
DNS=192.168.0.2
Domains=us-west-2.compute.internal

bash-5.1# resolvectl 
Global
       Protocols: -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: stub
      DNS Servers 192.168.0.2
       DNS Domain us-west-2.compute.internal

Link 2 (eth0)
Current Scopes: none
     Protocols: -DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported

bash-5.1# cat /etc/systemd/network/10-eth0.network.d/10-dns.conf 
[Network]
DNSDefaultRoute=false
[DHCPv4]
UseDNS=false
UseDomains=false
[DHCPv6]
UseDNS=false
UseDomains=false
[IPv6AcceptRA]
UseDNS=false
UseDomains=false
  • AWS - chain the Cilium CNI with AWS VPC CNI
  • Scale out an IPv6 and IPv4 cluster to 3000 nodes and ensure they all join properly
$ kubectl get nodes | grep -w Ready | wc -l
3000
  • Ensure the system doesn't wait for an interface configured with DHCP and marked as optional via network config. Ensure RequiredForOnline=false
[Match]
Name=eno3
[Link]
RequiredForOnline=false
[Network]
DHCP=ipv4
[DHCPv4]
UseMTU=true
  • Ensure the system doesn't wait for a protocol (IPv4/6) marked as optional via network config
[Match]
Name=eno1
[Link]
RequiredForOnline=true
RequiredFamilyForOnline=ipv4
[Network]
DHCP=yes
[DHCPv4]
UseMTU=true
[Ipv6AcceptRA]
UseMTU=true
version = 3

[bond0]
kind = "bond"
mode = "active-backup"
interfaces = ["eno1" , "eno2"]
dhcp4 = true

[bond0.monitoring]
miimon-frequency-ms = 100
miimon-updelay-ms = 200
miimon-downdelay-ms = 200

[VLAN43]
kind = "vlan"
device = "bond0"
id = 43
dhcp4 = true
  • Test all versions (1-3) of network config work as expected
  • Test interfaces configured via MAC address work as expected
  • Ensure MTU is set identically (via DHCP) to wicked variants across various platforms
Instance Type systemd-networkd wicked
c5.large 9001 9001
m6g.medium 9001 9001
c3.large 9001 9001
c1.xlarge 9001 9001
vmware 1500 1500

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

@zmrow
Copy link
Contributor Author

zmrow commented Aug 31, 2023

^ Removes the resolveConf kubelet config change in favor of a separate PR #3395

This commit adds the appropriate build flag to use systemd-networkd as
the network backend for these variants.
@zmrow zmrow force-pushed the networkd-1.28-ecs2 branch from c6de460 to 0b13357 Compare August 31, 2023 23:22
@zmrow
Copy link
Contributor Author

zmrow commented Aug 31, 2023

^ Rebase onto develop

This commit adds the appropriate build flag to use systemd-networkd as
the network backend for these variants.
This commit adds the appropriate build flag to use systemd-networkd as
the network backend for these variants.
@zmrow zmrow force-pushed the networkd-1.28-ecs2 branch from 0b13357 to f8c45fc Compare September 1, 2023 17:01
@zmrow
Copy link
Contributor Author

zmrow commented Sep 1, 2023

^ Flips aws-ecs-2-nvidia and the *-dev variants to systemd-networkd. Good catch @bcressey !

Copy link
Contributor

@stmcginnis stmcginnis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Built aws-k8s-1.28 x86_64 AMIs and deployed to a 1.27 EKS cluster.

Installed cilium 1.14.1 via helm:

helm install cilium cilium/cilium --version 1.14.1 \
  --namespace kube-system \
  --set eni.enabled=true \
  --set ipam.mode=eni \
  --set egressMasqueradeInterfaces=eth0 \
  --set tunnel=disabled \
  --set nodeinit.enabled=false

Verified pods were all happy:

$ kubectl get pods -n kube-system
NAME                               READY   STATUS    RESTARTS   AGE
cilium-4d5gn                       1/1     Running   0          7m36s
cilium-hwxhr                       1/1     Running   0          7m37s
cilium-l7xz6                       1/1     Running   0          7m37s
cilium-operator-66d75c5db6-hldz9   1/1     Running   0          4m27s
cilium-operator-66d75c5db6-vbshr   1/1     Running   0          4m48s
cilium-qz9bh                       1/1     Running   0          7m36s
coredns-647484dc8b-jvnk9           1/1     Running   0          4m7s
coredns-647484dc8b-mwx7r           1/1     Running   0          4m7s
kube-proxy-jgh7f                   1/1     Running   0          7m37s
kube-proxy-m2kjq                   1/1     Running   0          7m37s
kube-proxy-sr9fq                   1/1     Running   0          7m36s
kube-proxy-zlfzc                   1/1     Running   0          7m36s

And checked cilium status:

$ cilium status
    /¯¯\
 /¯¯\__/¯¯\    Cilium:          OK
 \__/¯¯\__/    Operator:        OK
 /¯¯\__/¯¯\    Hubble Relay:    disabled
 \__/¯¯\__/    ClusterMesh:     disabled
    \__/

Deployment        cilium-operator    Desired: 2, Ready: 2/2, Available: 2/2
DaemonSet         cilium             Desired: 4, Ready: 4/4, Available: 4/4
Containers:       cilium             Running: 4
                  cilium-operator    Running: 2
Cluster Pods:     6/6 managed by Cilium
Image versions    cilium             quay.io/cilium/cilium:v1.14.1@sha256:edc1d05ea1365c4a8f6ac6982247d5c145181704894bb698619c3827b6963a72: 4
                  cilium-operator    quay.io/cilium/operator-aws:v1.14.1@sha256:ff57964aefd903456745e53a4697a4f6a026d8fffdb06f53f624a23d23ade37a: 2

Ran connectivity tests:

$ cilium connectivity test

...

✅ All 32 tests (250 actions) successful, 0 tests skipped, 1 scenarios skipped.

Everything looks good!

@zmrow zmrow marked this pull request as ready for review September 9, 2023 03:07
@zmrow zmrow merged commit 4fcc667 into bottlerocket-os:develop Sep 11, 2023
@zmrow zmrow deleted the networkd-1.28-ecs2 branch September 11, 2023 18:46
@heri16
Copy link

heri16 commented May 24, 2024

We are using the BottlerRocket AWS-ECS-2 variant. And it fails to detect other ENIs that has been attached. Anyway to solve this?

@yeazelm
Copy link
Contributor

yeazelm commented May 24, 2024

Hello @heri16, can you cut a new issue for this? The networkd work has been complete for a while and should be attaching ENIs. Cutting a separate issue would let us dive in just on the issue you are facing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants