Skip to content
This repository has been archived by the owner on Dec 2, 2021. It is now read-only.

Commit

Permalink
Proposal for re-factoring apply
Browse files Browse the repository at this point in the history
- define interfaces for encapuslating logic single-purpose in structs
- use PUT instead of PATCH so we don't need to generate PATCH directives
  • Loading branch information
pwittrock committed Dec 1, 2021
1 parent ee1c6f9 commit 0b02e52
Showing 1 changed file with 184 additions and 0 deletions.
184 changes: 184 additions & 0 deletions cli/apply_refactor.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,184 @@
# Apply v2

## Background

`kubectl apply` reads a file or set of files, and updates the cluster state based off the file contents.
It does a couple things:

1. Create / Update / (Delete) the live resources based on the file contents
2. Update currently and previously configured fields, without clobbering fields set by other means,
such as imperative kubectl commands, other deployment and management tools, admission controllers,
initializers, horizontal and vertical autoscalers, operators, and other controllers.

Essential complexity in the apply code comes from supporting custom strategies for
merging fields built into the object schema -
such as merging lists together based on a field `key` and deleting individual
items from a list by `key`.

Accidental complexity in the apply code comes from the structure growing organically in ways that have
broken encapsulation and separation of concerns. This has lead to maintenance challenges as
keeping ordering for items in a list, and correctly merging lists of primitives (`key`less).

Round tripping changes through PATCHes introduces additional accidental complexity,
as they require imperative directives that are not part of the object schema.

## Objective


Reduce maintenance burden by minimizing accidental complexity in the apply codebase.

This should help:

- Simplify introducing new merge semantics
- Simplify enabling / disabling new logic with flags

## Changes

Implementation of proposed changes under review in PR [52349](https://github.com/kubernetes/kubernetes/pull/52349)

### Use read-update instead of patch

#### Why

Building a PATCH from diff creates additional code complexity vs directly updating the object.

- Need to generate imperative delete directives instead of simply deleting an item from a list.
- Using PATCH semantics and directives is less well known and understood by most users
than using the object schema itself. This makes it harder for non-experts to maintain the codebase.
- Using PATCH semantics is more work to implement a diff of the changes as
PATCH must be separately merged on the remote object for to display the diff.

#### New approach

1. Read the live object
2. Compare the live object to last-applied and local files
3. Update the fields on the live object that was read
4. Send a PUT to update the modified object
5. If encountering optimistic lock failure, retry back to 1.

### Restructure code into modular components

In the current implementation of apply - parsing and traversing the object trees, diffing the
contents and generating the patch are entangled. This creates maintenance and
testing challenges. We should instead encapsulate discrete responsibilities in separate packages -
such as collating the object values and updating the target object.

#### Phase 1: Parse last-applied, local, live objects and collate

Provide a structure that contains the last, local and live value for each field. This
will make it easy to walk the a single tree when making decisions about how to update the object.
Decisions about ordering of lists or parsing metadata for fields are made here.

#### Phase 2: Diff and update objects

Use the visitor pattern to encapsulate how to update each field type for each merge strategy.
Unit test each visit function. Decisions about how to replace, merge, or delete a field or
list item are made here.

## Notable items

- Merge will use openapi to get the schema from the server
- Merge can be run either on the server side or the client side
- Merge can handle 2-way or 3-way merges of objects (initially will not support PATCH directives)

## Out of scope of this doc

In order to make apply sufficiently maintainable and extensible to new API types, as well as to make its
behavior more intuitive for users, the merge behavior, including how it is specified in the API schema,
must be systematically redesigned and more thoroughly tested.

Examples of issues that need to be resolved

- schema metadata `patchStrategy` and `mergeKey` are implicit, unversioned and incorrect in some cases.
to fix the incorrect metadata, the metadata must be versioned so PATCHes generated will old metadata continue
to be merged by the server in the manner they were intended
- need to version all schema metadata for each objects and provide this are part of the request
- e.g. container port [39188](https://github.com/kubernetes/kubernetes/issues/39188)
- no semantic way to represent union fields [35345](https://github.com/kubernetes/kubernetes/issues/35345)


## Detailed analysis of structure and impact today

The following PRs constitute the focus of ~6 months of engineering work. Each of the PRs is very complex
for the work what it is solving.

### Patterns observed

- PRs frequently closed or deferred because maintainers / reviewers cannot reason about the impact or
correctness of the changes
- Relatively simple changes
- are 200+ lines of code
- modify dozens of existing locations in the code
- are spread across 1000+ lines of existing code
- Changes that add new directives require updates in multiple locations - create patch + apply patch

### PRs

[38665](https://github.com/kubernetes/kubernetes/pull/38665/files)
- Support deletion of primitives from lists
- Lines (non-test): ~200
- ~6 weeks
[44597](https://github.com/kubernetes/kubernetes/pull/44597/files)
- Support deleting fields not listed in the patch
- Lines (non-test): ~250
- ~6 weeks
[45980](https://github.com/kubernetes/kubernetes/pull/45980/files#diff-101008d96c4444a5813f7cb6b54aaff6)
- Keep ordering of items when merging lists
- Lines (non-test): ~650
[46161](https://github.com/kubernetes/kubernetes/pull/46161/files#diff-101008d96c4444a5813f7cb6b54aaff6)
- Support using multiple fields for a merge key
- Status: Deferred indefinitely - too hard for maintainers to understand impact and correctness of changes
[46560](https://github.com/kubernetes/kubernetes/pull/46560/files)
- Support diff apply (1st attempt)
- Status: Closed - too hard for maintainers to understand impact and correctness of changes
[49174](https://github.com/kubernetes/kubernetes/pull/49174/files)
- Support diff apply (2nd attempt)
- Status: Deferred indefinitely - too hard for maintainers to understand impact and correctness of changes
- Maintainer reviews: 3


### Analysis - causes of complexity

Apply is implemented by diffing the 3 sources (last-applied, local, remote) as 2 2-way diffs and then
merging the results of those 2 diffs into a 3rd result. The diffs can each produce patch request where
a single logic update (e.g. remove 'foo' and add 'bar' to a field that is a list) may require spreading the
patch result across multiple pieces of the patch (a 'delete' directive, an 'order' directive
and the list itself).

Because of the way diff is implemented with 2-way diffs, a simple bit of logic
"compare local to remote" and do X - is non-trivial to define. The code that compares local to remote
is also executed to compare last-applied to local, but with the local argument differing in location.
To compare local to remote means understanding what will happen when the same code is executed
comparing last-applied to local, and then putting in the appropriate guards to short-circuit the
logic in one context or the other as needed. last-appied and remote are not compared directly, and instead
are only compared indirectly when the 2 diff results are merged. Information that is redundant or
should be checked for consistency across all 3 sources (e.g. checking for conflicts) is spread across
3 logic locations - the first 2-way diff, the second 2-way diff and the merge of the 2 diffs.

That the diffs each may produce multiple patch directives + results that constitute an update to a single
field compounds the complexity of that comparing a single field occurs across 3 locations.

The diff / patch logic itself does not follow any sort of structure to encapsulate complexity
into components so that logic doesn't bleed cross concerns. The logic to collate the last-applied, local and
remote field values, the logic to diff the field values and the logic to create the patch is
all combined in the same group of package-scoped functions, instead of encapsulating
each of these responsibilities in its own interface.

Sprinkling the implementation across dozens of locations makes it very challenging to
flag guard the new behavior. If issues are discovered during the stabilization period we cannot
easily revert to the previous behavior by changing a default flag value. The inability to build
in these sorts of break-glass options further degrades confidence in safely accepting PRs.

This is a text-book example of what the [Visitor pattern](https://en.wikipedia.org/wiki/Visitor_pattern)
was designed to address.

- Encapsulate logic in *Element*s and *Visitor*s
- Introduce logic for new a field type by adding a new *Element* type
- Introduce logic for new a merge strategy by defining a new *Visitor* implementation
- Introduce logic on structuring of a field by updating the parsing function for that field type

If the apply diff logic was redesigned, most of the preceding PRs could be implemented by
only touching a few existing code locations to introduce the new type / method, and
then encapsulating the logic in a single type. This would make it simple to flag guard
new behaviors before defaulting them to on.

0 comments on commit 0b02e52

Please sign in to comment.