FAILED - RETRYING: [192.168.1.5]: Verify that all nodes actually joined #632
Unanswered
vikasjayaswal
asked this question in
Questions
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Issue:
My home lab network is 192.168.1.x. I can't seem to get my controller nodes to join. I am so frustrated, because I have validated that the machines are up and running (nodes), I can ssh to to each host. Based on prior users, I made sure my METALB did not overlap any nodes I created. I also made sure my password token is alphanumeric only.
My gut instinct is there must be some issue with the all.yml file.
Please help?
Resolution Attempts:
➜ terraform git:(main) ✗ ping 192.168.1.222
PING 192.168.1.222 (192.168.1.222) 56(84) bytes of data.
From 192.168.1.4 icmp_seq=1 Destination Host Unreachable
From 192.168.1.4 icmp_seq=2 Destination Host Unreachable
3 After script bombs out with errors see below, I can successfully ping 192.168.1.222 (the apiserver_endpoint)
FAILED - RETRYING: [192.168.1.5]: Verify that all nodes actually joined (check k3s-init.service if this fails) (1 retries left).
fatal: [192.168.1.4]: FAILED! => {"attempts": 20, "changed": false, "cmd": ["k3s", "kubectl", "get", "nodes", "-l", "node-role.kubernetes.io/master=true", "-o=jsonpath={.items[*].metadata.name}"], "delta": "0:00:00.067656", "end": "2025-01-17 09:44:46.330847", "msg": "", "rc": 0, "start": "2025-01-17 09:44:46.263191", "stderr": "", "stderr_lines": [], "stdout": "kubernetes-controller-192", "stdout_lines": ["kubernetes-controller-192"]}
hosts.ini
[master]
192.168.1.3
192.168.1.4
192.168.1.5
[node]
192.168.1.6
192.168.1.7
only required if proxmox_lxc_configure: true
must contain all proxmox instances that have a master or worker node
[proxmox]
192.168.30.43
[k3s_cluster:children]
master
node
all.yml
k3s_version: v1.30.2+k3s2
this is the user that has ssh access to these machines
ansible_user: vic
systemd_dir: /etc/systemd/system
Set your timezone
system_timezone: "Etc/UTC"
interface which will be used for flannel
flannel_iface: eth0
uncomment calico_iface to use tigera operator/calico cni instead of flannel https://docs.tigera.io/calico/latest/about
calico_iface: "eth0"
calico_ebpf: false # use eBPF dataplane instead of iptables
calico_tag: v3.28.0 # calico version tag
uncomment cilium_iface to use cilium cni instead of flannel or calico
ensure v4.19.57, v5.1.16, v5.2.0 or more recent kernel
cilium_iface: "eth0"
cilium_mode: native # native when nodes on same subnet or using bgp, else set routed
cilium_tag: v1.16.0 # cilium version tag
cilium_hubble: true # enable hubble observability relay and ui
if using calico or cilium, you may specify the cluster pod cidr pool
cluster_cidr: 10.52.0.0/16
enable cilium bgp control plane for lb services and pod cidrs. disables metallb.
cilium_bgp: false
bgp parameters for cilium cni. only active when cilium_iface is defined and cilium_bgp is true.
cilium_bgp_my_asn: "64513"
cilium_bgp_peer_asn: "64512"
cilium_bgp_peer_address: 192.168.30.1
cilium_bgp_lb_cidr: 192.168.31.0/24 # cidr for cilium loadbalancer ipam
enable kube-vip ARP broadcasts
kube_vip_arp: true
enable kube-vip BGP peering
kube_vip_bgp: false
bgp parameters for kube-vip
kube_vip_bgp_routerid: "127.0.0.1" # Defines the router ID for the BGP server
kube_vip_bgp_as: "64513" # Defines the AS for the BGP server
kube_vip_bgp_peeraddress: "192.168.30.1" # Defines the address for the BGP peer
kube_vip_bgp_peeras: "64512" # Defines the AS for the BGP peer
apiserver_endpoint is virtual ip-address which will be configured on each master
apiserver_endpoint: 192.168.1.222
k3s_token is required masters can talk together securely
this token should be alpha numeric only
k3s_token: some1secret2token3
The IP on which the node is reachable in the cluster.
Here, a sensible default is provided, you can still override
it for each of your hosts, though.
k3s_node_ip: "{{ ansible_facts[(cilium_iface | default(calico_iface | default(flannel_iface)))]['ipv4']['address'] }}"
Disable the taint manually by setting: k3s_master_taint = false
k3s_master_taint: "{{ true if groups['node'] | default([]) | length >= 1 else false }}"
these arguments are recommended for servers as well as agents:
extra_args: >-
{{ '--flannel-iface=' + flannel_iface if calico_iface is not defined and cilium_iface is not defined else '' }}
--node-ip={{ k3s_node_ip }}
change these to your liking, the only required are: --disable servicelb, --tls-san {{ apiserver_endpoint }}
the contents of the if block is also required if using calico or cilium
extra_server_args: >-
{{ extra_args }}
{{ '--node-taint node-role.kubernetes.io/master=true:NoSchedule' if k3s_master_taint else '' }}
{% if calico_iface is defined or cilium_iface is defined %}
--flannel-backend=none
--disable-network-policy
--cluster-cidr={{ cluster_cidr | default('10.52.0.0/16') }}
{% endif %}
--tls-san {{ apiserver_endpoint }}
--disable servicelb
--disable traefik
extra_agent_args: >-
{{ extra_args }}
image tag for kube-vip
kube_vip_tag_version: v0.8.2
tag for kube-vip-cloud-provider manifest
kube_vip_cloud_provider_tag_version: "main"
kube-vip ip range for load balancer
(uncomment to use kube-vip for services instead of MetalLB)
kube_vip_lb_ip_range: "192.168.30.80-192.168.30.90"
metallb type frr or native
metal_lb_type: native
metallb mode layer2 or bgp
metal_lb_mode: layer2
bgp options
metal_lb_bgp_my_asn: "64513"
metal_lb_bgp_peer_asn: "64512"
metal_lb_bgp_peer_address: "192.168.30.1"
image tag for metal lb
metal_lb_speaker_tag_version: v0.14.8
metal_lb_controller_tag_version: v0.14.8
metallb ip range for load balancer
metal_lb_ip_range: 192.168.1.80-192.168.1.90
Only enable if your nodes are proxmox LXC nodes, make sure to configure your proxmox nodes
in your hosts.ini file.
Please read https://gist.github.com/triangletodd/02f595cd4c0dc9aac5f7763ca2264185 before using this.
Most notably, your containers must be privileged, and must not have nesting set to true.
Please note this script disables most of the security of lxc containers, with the trade off being that lxc
containers are significantly more resource efficient compared to full VMs.
Mixing and matching VMs and lxc containers is not supported, ymmv if you want to do this.
I would only really recommend using this if you have particularly low powered proxmox nodes where the overhead of
VMs would use a significant portion of your available resources.
proxmox_lxc_configure: false
the user that you would use to ssh into the host, for example if you run ssh some-user@my-proxmox-host,
set this value to some-user
proxmox_lxc_ssh_user: root
the unique proxmox ids for all of the containers in the cluster, both worker and master nodes
proxmox_lxc_ct_ids:
Only enable this if you have set up your own container registry to act as a mirror / pull-through cache
(harbor / nexus / docker's official registry / etc).
Can be beneficial for larger dev/test environments (for example if you're getting rate limited by docker hub),
or air-gapped environments where your nodes don't have internet access after the initial setup
(which is still needed for downloading the k3s binary and such).
k3s's documentation about private registries here: https://docs.k3s.io/installation/private-registry
custom_registries: false
The registries can be authenticated or anonymous, depending on your registry server configuration.
If they allow anonymous access, simply remove the following bit from custom_registries_yaml
configs:
"registry.domain.com":
auth:
username: yourusername
password: yourpassword
The following is an example that pulls all images used in this playbook through your private registries.
It also allows you to pull your own images from your private registry, without having to use imagePullSecrets
in your deployments.
If all you need is your own images and you don't care about caching the docker/quay/ghcr.io images,
you can just remove those from the mirrors: section.
custom_registries_yaml: |
mirrors:
docker.io:
endpoint:
- "https://registry.domain.com/v2/dockerhub"
quay.io:
endpoint:
- "https://registry.domain.com/v2/quayio"
ghcr.io:
endpoint:
- "https://registry.domain.com/v2/ghcrio"
registry.domain.com:
endpoint:
- "https://registry.domain.com"
configs:
"registry.domain.com":
auth:
username: yourusername
password: yourpassword
On some distros like Diet Pi, there is no dbus installed. dbus required by the default reboot command.
Uncomment if you need a custom reboot command
custom_reboot_command: /usr/sbin/shutdown -r now
Only enable and configure these if you access the internet through a proxy
proxy_env:
HTTP_PROXY: "http://proxy.domain.local:3128"
HTTPS_PROXY: "http://proxy.domain.local:3128"
NO_PROXY: "*.domain.local,127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16"
Beta Was this translation helpful? Give feedback.
All reactions