Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KO does not follow k8s version skew policy. Cannot upgrade CP when kubelet has an older minor release. #3010

Open
PoudNL opened this issue Jan 13, 2024 Discussed in #3009 · 11 comments

Comments

@PoudNL
Copy link

PoudNL commented Jan 13, 2024

Discussed in #3009

Originally posted by PoudNL January 12, 2024
Currently we have a Kubeone provisioned cluster running Kubernetes 1.23.17. This cluster has 3 control-plane nodes and approx. 35 worker nodes. Normally we upgrade the CP nodes to the next minor release and rollover the worker nodes to the same version.
This last step takes a couple of days in our environment, unfortunately speeding up that process is currently not possible.

Because we are running such an old version of Kubernetes, we would like to upgrade to 1.24.x on the control-planes and immediately upgrade to 1.25.x without first upgrading the worker nodes. This is in line with the Kubernetes version skew policy, because CP components can have a version skew of max 2 minors with Kubernetes 1.23.17.

Kubeone documentation states that it follows the version skew policy of Kubernetes, but unfortunately the upgrade from 1.24.x -> 1.25.x with kubeone 1.6.x fails due too the version skew. Kubeadm will report a warning message to let the administrator know that it is not recommended. This warning message will result in Kubeone to fail the process.

[192.168.199.141] [upgrade] Running cluster health checks
[192.168.199.141] [upgrade/version] You have chosen to change the cluster version to "v1.25.16"
[192.168.199.141] [upgrade/versions] Cluster version: v1.24.17
[192.168.199.141] [upgrade/versions] kubeadm version: v1.25.16
[192.168.199.141] [upgrade/version] FATAL: the --version argument is invalid due to these errors:
[192.168.199.141]
[192.168.199.141]       - There are kubelets in this cluster that are too old that have these versions [v1.23.17]
[192.168.199.141]
[192.168.199.141] Can be bypassed if you pass the --force flag
[192.168.199.141] To see the stack trace of this error execute with --v=5 or higher
WARN[21:59:12 CET] Task failed, error was: runtime: running task on "192.168.199.141"
ssh: running kubeadm upgrade on control plane leader
ssh: popen
Process exited with status 1

How can I make Kubeone force the update? I found some documentation about using kubeone upgrade instead of kubeone apply , but this feature is deprecated. Also I don't see any lines in the code that would add the --force parameter to the kubeadm command when upgrading.

So does anyone know how to solve my journey and let Kubeone indeed follow the version skew policy of Kubernetes?

How to reproduce

  • Use kubeone 1.5: to create a cluster with 2 (or more) node controlplanes and at least one worker with version 1.23.x
  • With kubeone 1.5: Upgrade CP nodes to 1.24.x (leave worker(s) on 1.23.x)
  • Upgrade to kubeone 1.6
  • With kubeone 1.6: Try upgrading CP nodes from 1.24.x to 1.25.x
@xmudrii
Copy link
Member

xmudrii commented Jan 13, 2024

This is in line with the Kubernetes version skew policy, because CP components can have a version skew of max 2 minors with Kubernetes 1.23.17.

That's unfortunately more complicated. Kubernetes declared that kubelet supports n-2 version skew, but kubeadm implemented that change only starting from Kubernetes v1.29. Here are some references for that:

The best we can do is to consider adding a new feature/option to allow passing --force flag to kubeadm. To be honest, this is technically doable and straightforward, but there are many risks coming from exposing such a feature because it's not possible to distinguish legitimate (like yours) and non-legitimate use cases.

However, whatever we decide, it might take some time for this to get implemented and eventually cherry-picked to v1.6. I'm not sure if this is an option for you, but you can eventually compile your own version of KubeOne 1.6 that uses the --force flag for kubeadm upgrade (if this is an option, I could write down what do you need to change and how to compile KubeOne).

@PoudNL
Copy link
Author

PoudNL commented Jan 13, 2024

Nice findings!

I was also considering compiling a custom kubeone with the --force flag added, but I was not sure if there was a reason that this behaviour was not implemented yet. Maybe my way of thinking missed a spot in some checks.
But you just confirmed that I was on the right track, thanks for that.

I would suggest adding it as an option to kubeone apply would be a great solution. Something like --force-kubeadm=[true|false] Of course with all the warnings that people need to be careful and need to understand that it could also harm if you don't know why you are forcing. As you described... only use in a legitimate way.
A lot of clusters will still run on a older version and it will help them upgrade in these cases.

As you mentioned, the code of kubeadm 1.29 suggests that it will accept a skew of 1 minor (between kubelet and cp), but from version 1.25 a skew of 2 is even allowed between kubelet and controlplane. Therefor the option will also be viable for people that run v1.29 cluster and like to upgrade to 1.30/1.31/1.32 when released, without updating the kubelets on workers. So the possible new option will also be needed in some cases after 1.29.

@xmudrii
Copy link
Member

xmudrii commented Jan 14, 2024

As you mentioned, the code of kubeadm 1.29 suggests that it will accept a skew of 1 minor (between kubelet and cp), but from version 1.25 a skew of 2 is even allowed between kubelet and controlplane.

This is not fully accurate. Starting from 1.25, kubelet supports a skew of 2 minor releases (in later releases this was changed to 3), but kubeadm doesn't allow it until 1.29. If you would try to upgrade to 1.29, while having 1.27 (or even 1.26) worker nodes (kubelets), that would work because kubeadm 1.29 allows a version skew of 3 (MaximumAllowedMinorVersionKubeletSkew is 3). However, you can't upgrade to 1.28 while having 1.26 (or 1.25) worker nodes (kubelets) because kubeadm 1.28 doesn't allow it (MaximumAllowedMinorVersionKubeletSkew is 1).

@kubermatic-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.
After a furter 30 days, they will turn rotten.
Mark the issue as fresh with /remove-lifecycle stale.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@kubermatic-bot kubermatic-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 14, 2024
@kubermatic-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle rotten

@kubermatic-bot kubermatic-bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 14, 2024
@xmudrii
Copy link
Member

xmudrii commented May 20, 2024

/remove-lifecycle rotten

@kubermatic-bot kubermatic-bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale. label May 20, 2024
@kubermatic-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.
After a furter 30 days, they will turn rotten.
Mark the issue as fresh with /remove-lifecycle stale.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@kubermatic-bot kubermatic-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 18, 2024
@xmudrii
Copy link
Member

xmudrii commented Aug 19, 2024

/remove-lifecycle stale

@kubermatic-bot kubermatic-bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 19, 2024
@kubermatic-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.
After a furter 30 days, they will turn rotten.
Mark the issue as fresh with /remove-lifecycle stale.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@kubermatic-bot kubermatic-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 17, 2024
@kubermatic-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle rotten

@kubermatic-bot kubermatic-bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 18, 2024
@xmudrii
Copy link
Member

xmudrii commented Jan 6, 2025

We should check if we're complaint with the latest version skew policy.
/remove-lifecycle rotten

@kubermatic-bot kubermatic-bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale. label Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants