FAILED - RETRYING: [192.168.1.5]: Verify that all nodes actually joined #632

vikasjayaswal · 2025-01-17T09:57:14Z

vikasjayaswal
Jan 17, 2025

Issue:

My home lab network is 192.168.1.x. I can't seem to get my controller nodes to join. I am so frustrated, because I have validated that the machines are up and running (nodes), I can ssh to to each host. Based on prior users, I made sure my METALB did not overlap any nodes I created. I also made sure my password token is alphanumeric only.

My gut instinct is there must be some issue with the all.yml file.

Please help?

Resolution Attempts:

I have validated that all the master and and node ports are created correctly via terraform, and have the correct SSH key.
I have validated that the apiserver_endpoint: 192.168.1.222 is not in use; before I started ansible script
➜ terraform git:(main) ✗ ping 192.168.1.222
PING 192.168.1.222 (192.168.1.222) 56(84) bytes of data.
From 192.168.1.4 icmp_seq=1 Destination Host Unreachable
From 192.168.1.4 icmp_seq=2 Destination Host Unreachable

3 After script bombs out with errors see below, I can successfully ping 192.168.1.222 (the apiserver_endpoint)

FAILED - RETRYING: [192.168.1.5]: Verify that all nodes actually joined (check k3s-init.service if this fails) (1 retries left).
fatal: [192.168.1.4]: FAILED! => {"attempts": 20, "changed": false, "cmd": ["k3s", "kubectl", "get", "nodes", "-l", "node-role.kubernetes.io/master=true", "-o=jsonpath={.items[*].metadata.name}"], "delta": "0:00:00.067656", "end": "2025-01-17 09:44:46.330847", "msg": "", "rc": 0, "start": "2025-01-17 09:44:46.263191", "stderr": "", "stderr_lines": [], "stdout": "kubernetes-controller-192", "stdout_lines": ["kubernetes-controller-192"]}

hosts.ini

[master]
192.168.1.3
192.168.1.4
192.168.1.5

[node]
192.168.1.6
192.168.1.7

only required if proxmox_lxc_configure: true

must contain all proxmox instances that have a master or worker node

[proxmox]

192.168.30.43

[k3s_cluster:children]
master
node

all.yml

k3s_version: v1.30.2+k3s2

this is the user that has ssh access to these machines

ansible_user: vic
systemd_dir: /etc/systemd/system

Set your timezone

system_timezone: "Etc/UTC"

interface which will be used for flannel

flannel_iface: eth0

uncomment calico_iface to use tigera operator/calico cni instead of flannel https://docs.tigera.io/calico/latest/about

calico_iface: "eth0"

calico_ebpf: false # use eBPF dataplane instead of iptables
calico_tag: v3.28.0 # calico version tag

uncomment cilium_iface to use cilium cni instead of flannel or calico

ensure v4.19.57, v5.1.16, v5.2.0 or more recent kernel

cilium_iface: "eth0"

cilium_mode: native # native when nodes on same subnet or using bgp, else set routed
cilium_tag: v1.16.0 # cilium version tag
cilium_hubble: true # enable hubble observability relay and ui

if using calico or cilium, you may specify the cluster pod cidr pool

cluster_cidr: 10.52.0.0/16

enable cilium bgp control plane for lb services and pod cidrs. disables metallb.

cilium_bgp: false

bgp parameters for cilium cni. only active when cilium_iface is defined and cilium_bgp is true.

cilium_bgp_my_asn: "64513"
cilium_bgp_peer_asn: "64512"
cilium_bgp_peer_address: 192.168.30.1
cilium_bgp_lb_cidr: 192.168.31.0/24 # cidr for cilium loadbalancer ipam

enable kube-vip ARP broadcasts

kube_vip_arp: true

enable kube-vip BGP peering

kube_vip_bgp: false

bgp parameters for kube-vip

kube_vip_bgp_routerid: "127.0.0.1" # Defines the router ID for the BGP server
kube_vip_bgp_as: "64513" # Defines the AS for the BGP server
kube_vip_bgp_peeraddress: "192.168.30.1" # Defines the address for the BGP peer
kube_vip_bgp_peeras: "64512" # Defines the AS for the BGP peer

apiserver_endpoint is virtual ip-address which will be configured on each master

apiserver_endpoint: 192.168.1.222

k3s_token is required masters can talk together securely

this token should be alpha numeric only

k3s_token: some1secret2token3

The IP on which the node is reachable in the cluster.

Here, a sensible default is provided, you can still override

it for each of your hosts, though.

k3s_node_ip: "{{ ansible_facts[(cilium_iface | default(calico_iface | default(flannel_iface)))]['ipv4']['address'] }}"

Disable the taint manually by setting: k3s_master_taint = false

k3s_master_taint: "{{ true if groups['node'] | default([]) | length >= 1 else false }}"

these arguments are recommended for servers as well as agents:

extra_args: >-
{{ '--flannel-iface=' + flannel_iface if calico_iface is not defined and cilium_iface is not defined else '' }}
--node-ip={{ k3s_node_ip }}

change these to your liking, the only required are: --disable servicelb, --tls-san {{ apiserver_endpoint }}

the contents of the if block is also required if using calico or cilium

extra_server_args: >-
{{ extra_args }}
{{ '--node-taint node-role.kubernetes.io/master=true:NoSchedule' if k3s_master_taint else '' }}
{% if calico_iface is defined or cilium_iface is defined %}
--flannel-backend=none
--disable-network-policy
--cluster-cidr={{ cluster_cidr | default('10.52.0.0/16') }}
{% endif %}
--tls-san {{ apiserver_endpoint }}
--disable servicelb
--disable traefik

extra_agent_args: >-
{{ extra_args }}

image tag for kube-vip

kube_vip_tag_version: v0.8.2

tag for kube-vip-cloud-provider manifest

kube_vip_cloud_provider_tag_version: "main"

kube-vip ip range for load balancer

(uncomment to use kube-vip for services instead of MetalLB)

kube_vip_lb_ip_range: "192.168.30.80-192.168.30.90"

metallb type frr or native

metal_lb_type: native

metallb mode layer2 or bgp

metal_lb_mode: layer2

bgp options

metal_lb_bgp_my_asn: "64513"

metal_lb_bgp_peer_asn: "64512"

metal_lb_bgp_peer_address: "192.168.30.1"

image tag for metal lb

metal_lb_speaker_tag_version: v0.14.8
metal_lb_controller_tag_version: v0.14.8

metallb ip range for load balancer

metal_lb_ip_range: 192.168.1.80-192.168.1.90

Only enable if your nodes are proxmox LXC nodes, make sure to configure your proxmox nodes

in your hosts.ini file.

Please read https://gist.github.com/triangletodd/02f595cd4c0dc9aac5f7763ca2264185 before using this.

Most notably, your containers must be privileged, and must not have nesting set to true.

Please note this script disables most of the security of lxc containers, with the trade off being that lxc

containers are significantly more resource efficient compared to full VMs.

Mixing and matching VMs and lxc containers is not supported, ymmv if you want to do this.

I would only really recommend using this if you have particularly low powered proxmox nodes where the overhead of

VMs would use a significant portion of your available resources.

proxmox_lxc_configure: false

the user that you would use to ssh into the host, for example if you run ssh some-user@my-proxmox-host,

set this value to some-user

proxmox_lxc_ssh_user: root

the unique proxmox ids for all of the containers in the cluster, both worker and master nodes

proxmox_lxc_ct_ids:

200
201
202
203
204

Only enable this if you have set up your own container registry to act as a mirror / pull-through cache

(harbor / nexus / docker's official registry / etc).

Can be beneficial for larger dev/test environments (for example if you're getting rate limited by docker hub),

or air-gapped environments where your nodes don't have internet access after the initial setup

(which is still needed for downloading the k3s binary and such).

k3s's documentation about private registries here: https://docs.k3s.io/installation/private-registry

custom_registries: false

The registries can be authenticated or anonymous, depending on your registry server configuration.

If they allow anonymous access, simply remove the following bit from custom_registries_yaml

configs:

"registry.domain.com":

auth:

username: yourusername

password: yourpassword

The following is an example that pulls all images used in this playbook through your private registries.

It also allows you to pull your own images from your private registry, without having to use imagePullSecrets

in your deployments.

If all you need is your own images and you don't care about caching the docker/quay/ghcr.io images,

you can just remove those from the mirrors: section.

custom_registries_yaml: |
mirrors:
docker.io:
endpoint:
- "https://registry.domain.com/v2/dockerhub"
quay.io:
endpoint:
- "https://registry.domain.com/v2/quayio"
ghcr.io:
endpoint:
- "https://registry.domain.com/v2/ghcrio"
registry.domain.com:
endpoint:
- "https://registry.domain.com"

configs:
"registry.domain.com":
auth:
username: yourusername
password: yourpassword

On some distros like Diet Pi, there is no dbus installed. dbus required by the default reboot command.

Uncomment if you need a custom reboot command

custom_reboot_command: /usr/sbin/shutdown -r now

Only enable and configure these if you access the internet through a proxy

proxy_env:

HTTP_PROXY: "http://proxy.domain.local:3128"

HTTPS_PROXY: "http://proxy.domain.local:3128"

NO_PROXY: "*.domain.local,127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FAILED - RETRYING: [192.168.1.5]: Verify that all nodes actually joined #632

{{title}}

Replies: 0 comments

Select a reply

FAILED - RETRYING: [192.168.1.5]: Verify that all nodes actually joined #632

vikasjayaswal Jan 17, 2025

hosts.ini

only required if proxmox_lxc_configure: true

must contain all proxmox instances that have a master or worker node

[proxmox]

192.168.30.43

all.yml

this is the user that has ssh access to these machines

Set your timezone

interface which will be used for flannel

uncomment calico_iface to use tigera operator/calico cni instead of flannel https://docs.tigera.io/calico/latest/about

calico_iface: "eth0"

uncomment cilium_iface to use cilium cni instead of flannel or calico

ensure v4.19.57, v5.1.16, v5.2.0 or more recent kernel

cilium_iface: "eth0"

if using calico or cilium, you may specify the cluster pod cidr pool

enable cilium bgp control plane for lb services and pod cidrs. disables metallb.

bgp parameters for cilium cni. only active when cilium_iface is defined and cilium_bgp is true.

enable kube-vip ARP broadcasts

enable kube-vip BGP peering

bgp parameters for kube-vip

apiserver_endpoint is virtual ip-address which will be configured on each master

k3s_token is required masters can talk together securely

this token should be alpha numeric only

The IP on which the node is reachable in the cluster.

Here, a sensible default is provided, you can still override

it for each of your hosts, though.

Disable the taint manually by setting: k3s_master_taint = false

these arguments are recommended for servers as well as agents:

change these to your liking, the only required are: --disable servicelb, --tls-san {{ apiserver_endpoint }}

the contents of the if block is also required if using calico or cilium

image tag for kube-vip

tag for kube-vip-cloud-provider manifest

kube_vip_cloud_provider_tag_version: "main"

kube-vip ip range for load balancer

(uncomment to use kube-vip for services instead of MetalLB)

kube_vip_lb_ip_range: "192.168.30.80-192.168.30.90"

metallb type frr or native

metallb mode layer2 or bgp

bgp options

metal_lb_bgp_my_asn: "64513"

metal_lb_bgp_peer_asn: "64512"

metal_lb_bgp_peer_address: "192.168.30.1"

image tag for metal lb

metallb ip range for load balancer

Only enable if your nodes are proxmox LXC nodes, make sure to configure your proxmox nodes

in your hosts.ini file.

Please read https://gist.github.com/triangletodd/02f595cd4c0dc9aac5f7763ca2264185 before using this.

Most notably, your containers must be privileged, and must not have nesting set to true.

Please note this script disables most of the security of lxc containers, with the trade off being that lxc

containers are significantly more resource efficient compared to full VMs.

Mixing and matching VMs and lxc containers is not supported, ymmv if you want to do this.

I would only really recommend using this if you have particularly low powered proxmox nodes where the overhead of

VMs would use a significant portion of your available resources.

the user that you would use to ssh into the host, for example if you run ssh some-user@my-proxmox-host,

set this value to some-user

the unique proxmox ids for all of the containers in the cluster, both worker and master nodes

Only enable this if you have set up your own container registry to act as a mirror / pull-through cache

(harbor / nexus / docker's official registry / etc).

Can be beneficial for larger dev/test environments (for example if you're getting rate limited by docker hub),

or air-gapped environments where your nodes don't have internet access after the initial setup

(which is still needed for downloading the k3s binary and such).

k3s's documentation about private registries here: https://docs.k3s.io/installation/private-registry

The registries can be authenticated or anonymous, depending on your registry server configuration.

If they allow anonymous access, simply remove the following bit from custom_registries_yaml

configs:

"registry.domain.com":

auth:

username: yourusername

password: yourpassword

The following is an example that pulls all images used in this playbook through your private registries.

It also allows you to pull your own images from your private registry, without having to use imagePullSecrets

in your deployments.

If all you need is your own images and you don't care about caching the docker/quay/ghcr.io images,

you can just remove those from the mirrors: section.

On some distros like Diet Pi, there is no dbus installed. dbus required by the default reboot command.

Uncomment if you need a custom reboot command

custom_reboot_command: /usr/sbin/shutdown -r now

Only enable and configure these if you access the internet through a proxy

vikasjayaswal
Jan 17, 2025