-
Notifications
You must be signed in to change notification settings - Fork 403
[WIP] Re-Introducing the k3OS Operator. #284
Conversation
Addresses #253 |
adf930f
to
4b08a62
Compare
Can we get a TODO for supporting an upgrade in an air-gapped environment? i.e. no access to GitHub but has a local mirror. |
Well, if we leave the local mirror setup as an exercise to the user, setting up local DNS pointing to the locally mirrored github.com should be part of that, which means things should just work. speculative noodling belowI'd like to have the "local mirror" setup be as simple as setting up a local docker registry, pulling through it images with the artifacts, and shipping the volume they land on to a registry running in the air-gapped environment. Or maybe we ship a registry container that has the artifacts in it as part of the release, ready to serve. That actually sounds kinda cool because then a container running that registry can be a cluster-local channel provider pinned to that version. This would necessitate a redesign on how we deliver k3OS artifacts. |
Putting the artifacts in a local docker registry would be ideal for my use case, and would align well with air-gapped support in the larger Rancher ecosystem. |
cf737d8
to
87b23af
Compare
On GCP, attempt to upgrade from 0.7.0-dweomer2 to 0.7.0 proper. The operator objects to the hostname set by the cloud provider. E1120 00:50:40.745721 3264 controller.go:117] error syncing 'k3os-system/github-releases': handler k3os-operator: Job.batch "instance-2.c.brandonhale-181217.internal-upgrade" is invalid: spec.template.spec.containers[0].name: Invalid value: "instance-2.c.brandonhale-181217.internal-upgrade": a DNS-1123 label must consist of lower case alphanumeric characters or '-', and must start and end with an alphanumeric character (e.g. 'my-name', or '123-abc', regex used for validation is 'a-z0-9?'), requeuing |
thanks @bhale, i'm still stepping on some rakes! |
1704 files changed! :) Could it possible to separate the commit into two: one with auto-generated code, one with actually coding one? |
yes, I will be reorganizing this PR and that is on the TODO list |
Thanks @dweomer you guys are super great! |
- hack/ - vendor/ - pkg/generated/ - go.mod - go.sum
2c7b7e6
to
38eec37
Compare
Initially focused with providing a kubernetes-native upgrade experience, this change-set enhances the k3os multi-call binary to provide: - `k3os operator agent` custom resource controller(s) - `k3os upgrade` perform rootfs and kernel upgrades New Custom Resources: - Channel (for managing where to resolve the "latest" version) - UpgradeSet (for specifying cluster-wide upgrades) - NodeUpgrade (for specifying node-upgrades) Design Assumptions: The resolution of the "latest" version for a `Channel` is very simple in that you specify a URL, such as https://github.com/rancher/k3os/releases/latest, which responds with an HTTP 302 + Location where the base path, aka the last path element, of the Location is a release tag. This version is placed into `ChannelStatus.Version` which will trigger the creation of an `UpgradeSet`. `UpgradeSets` are automatically created for `Channels` but a `Channel` is not a necessary predicate for an `UpgradeSet`. They represent an upgrade policy for k3OS nodes in the cluster by specifying and tracking the number of concurrent upgrades that can run. `NodeUpgrades` reify an `UpgradeSet` for a `Node` specified by name. They are triggered on the creation/update of an `UpgradeSet` but may be created manually to incur a one-time upgrade. They trigger the creation and track the progress of a node-bound batch `Job` to perform the upgrade on the node.
type DrainSpec struct { | ||
Timeout *time.Duration `json:"timeout,omitempty"` | ||
GracePeriod *int32 `json:"gracePeriod,omitempty"` | ||
DeleteLocalData bool `json:"deleteLocalData,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd change to *bool
and nil means "true". Same for IgnoreDaemonSets. I think most of the time that is what most admins would want.
} | ||
|
||
// RegisterHandlers registers Channel handlers | ||
func RegisterHandlers(ctx context.Context, options Options) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An Options
struct typically has an implied behavior that all fields are optional. If something is required is should be a method argument.
if err != nil { | ||
return status, err | ||
} | ||
sysid := obj.GetClusterName() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This field has an unknown future in k8s. I'd avoid using it and just generate a unique ID and store it in the configmap on startup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure where to go with either of these so I just threw something in as a placeholder until we could review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found https://groups.google.com/forum/#!msg/kubernetes-sig-architecture/mVGobfD4TpY/nkdbkX1iBwAJ for the next iteration. Using the UID of the kube-system
namespace is an elegant hack.
if len(sysid) > 0 { | ||
sysid = fmt.Sprintf("cluster:%s", sysid) | ||
} else { | ||
sysid = fmt.Sprintf("channel:%v", obj.GetUID()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use some generated unique ID that is saved at the cluster lever. If the client is watching multiple channels we want this ID to be the same for all requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
sort.Strings(nodeNames) | ||
for _, nodeName := range nodeNames { | ||
nodeUpgrade := apiv1.NewNodeUpgrade(obj.Namespace, fmt.Sprintf("%s-%.16s", obj.Name, nodeName), apiv1.NodeUpgrade{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use SafeConcatName(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just the function I was looking for!
} else { | ||
logrus.Debugf("candidateNodes = %#v", candidateNodes) | ||
for i := 0; i < len(candidateNodes) && uint64(len(nodeNames)) < obj.Spec.Concurrency; i++ { | ||
nodeNames = append(nodeNames, candidateNodes[i].Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if there are more nodes available than the configured concurrency the generated nodeName list is not idempotent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I loop through and record all upgrading nodes before considering candidate nodes to guard against this. A node will fall off of the upgrading list if:
a. something other than the controller removes it from the UpgradeSetStatus.Upgrades
list, or
b. it no longer meets the base selection criteria which in almost all cases should mean an upgrade was applied and the node's version label matches the UpgradeSetSpec.Version
Closing this in favor of integrating with https://github.com/rancher/system-upgrade-controller |
Initially focused with providing a kubernetes-native upgrade experience, this change-set enhances the k3os multi-call binary to provide:
k3os operator agent
custom resource controller(s)k3os upgrade
perform rootfs and kernel upgradesNew Custom Resources
Design Assumptions
The resolution of the "latest" version for a
Channel
is very simple in that you specify a URL, such as https://github.com/rancher/k3os/releases/latest, which responds with an HTTP 302 + Location where the base path, aka the last path element, of the Location is a release tag. This version is placed intoChannelStatus.Version
which will trigger the creation of anUpgradeSet
.UpgradeSets
are automatically created forChannels
but aChannel
is not a necessary predicate for anUpgradeSet
. They represent an upgrade policy for k3OS nodes in the cluster by specifying and tracking the number of concurrent upgrades that can run.NodeUpgrades
reify anUpgradeSet
for aNode
specified by name. They are triggered on the creation/update of anUpgradeSet
but may be created manually to incur a one-time upgrade. They trigger the creation and track the progress of a node-bound batchJob
to perform the upgrade on the node.TODO
ConfigMap
for default/overrides for:DrainSpec
(onUpgradeSet
andNodeUpgrade
)ActiveDeadlineSeconds
and possiblyBackoffLimit
UpgradeSet
node selector to allow for multiple extant upgrade policies for a cluster that will not stomp on each otherChannel
creationUpgradeSet
creation (minus channel)NodeUpgrade
creation (minus upgradeset)