Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(api): deleteDelayDuration should be a string #13543

Merged
merged 3 commits into from
Sep 6, 2024

Conversation

agilgur5
Copy link

@agilgur5 agilgur5 commented Sep 1, 2024

Fixes #12631 , which is due to a tooling bug that I noticed in one of my first ever PRs to Argo: #11325 (comment)

Motivation

Modifications

  • workaround the tooling issues by declaring it as a string and parsing the string to a time.Duration ourselves

    • i.e. instead of the k8s metav1.Duration struct's unmarshaling
  • re-use ParseStringToDuration util func

    • rename its internal variables to not be specific to suspend as it's used by other fields as well
    • remove the duplicate func in operator.go
  • add warning if an error occurs during parsing

Verification

Notes to Reviewers

This fix is backward-compatible as it does not affect the CRDs (which were correct already) and only fixes the broken API piece.

Future Work

  1. This should be changed to deleteDelaySeconds per k8s convention as mentioned in deleteDelayDuration JSON schema and validation mismatch #12631 (comment). That would also obviate the need for this entirely as well. Changing the other durations in the spec to seconds would also remove the need for duration parsing code as well

@agilgur5 agilgur5 added area/api Argo Server API area/spec Changes to the workflow specification. area/gc Garbage collection, such as TTLs, retentionPolicy, delays, and more labels Sep 1, 2024
@agilgur5 agilgur5 marked this pull request as draft September 1, 2024 02:05
@agilgur5
Copy link
Author

agilgur5 commented Sep 1, 2024

Codegen is failing and seemingly resulting in the rest of the failures.

The diff in CI makes sense, but I oddly can't seem to get my local env to reproduce it (which sometimes errors out during codegen of this particular change). might be time to rebuild the devcontainer from scratch 🤷

Anton Gilgur added 2 commits August 31, 2024 22:07
- the CRD tooling gets this right because the k8s `apimachinery` package has code that `kube-openapi` generator understands: https://github.com/kubernetes/apimachinery/blob/2465dc5239ab8827a637148a78b380c278b4a5f4/pkg/apis/meta/v1/duration.go#L65
- meanwhile, the API tooling is different and not k8s-specific, so it doesn't know how to properly handle this type
  - this results in the API spec showing an incorrect schema, the one of the unmarshaled struct, instead of a plain string
  - workaround this by declaring it as a string and parsing the string to a `time.Duration` ourselves (instead of the k8s struct's unmarshaling)

- add warning if an error occurs during parsing
- re-use `ParseStringToDuration` util func
  - rename its internal variables to not be specific to suspend as it's used by other fields as well
  - remove the duplicate func in `operator.go`

Signed-off-by: Anton Gilgur <[email protected]>
- somehow my initial codegen missed half of the changes
  - and this commit is also missing changes... odd....

Signed-off-by: Anton Gilgur <[email protected]>
@agilgur5 agilgur5 added this to the v3.5.x patches milestone Sep 1, 2024
@agilgur5 agilgur5 force-pushed the fix-api-deleteDelayDuration-schema branch from 01441dc to 93c1225 Compare September 1, 2024 03:44
@agilgur5
Copy link
Author

agilgur5 commented Sep 1, 2024

might be time to rebuild the devcontainer from scratch 🤷

yea that fixed it. and sped up my devcontainer a bit for whatever reason

Copy link
Member

@terrytangyuan terrytangyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI you can make the PR as draft when you create the PR.

@agilgur5
Copy link
Author

agilgur5 commented Sep 1, 2024

I know, my codegen was wacky locally (it did a partial generation somehow), so I didn't know it was incorrect until CI showed that, which is when I changed it to a draft

@agilgur5 agilgur5 marked this pull request as ready for review September 1, 2024 04:13
@juliev0
Copy link
Contributor

juliev0 commented Sep 6, 2024

Approved. Out of curiosity, what are all the ways that we're using the json schema? Is it just being used in the Swagger documentation or more than that?

@juliev0 juliev0 merged commit c698990 into argoproj:main Sep 6, 2024
29 checks passed
@agilgur5
Copy link
Author

agilgur5 commented Sep 6, 2024

The SDKs are generated off of it as well (and apparently the field docs too??), as the generated changes here show.

It's also used for IDE schema support and the schema support in the UI editor. And people use it with other generators and tooling too, like Jsonnet in the Slack thread I linked in the issue, and like cdk8s in #13530

So it primarily affects tooling. Which can be quite confusing as the schema doesn't match the actual API, so you'll get false positive schema errors as a result.
And the schema has some issues because of the tooling (non-k8s-aware) and steps used to generate it (Go -> protobuf -> OpenAPI, vs the k8s tooling is Go -> OpenAPI directly). Tooling on tooling

@agilgur5 agilgur5 deleted the fix-api-deleteDelayDuration-schema branch September 6, 2024 18:00
Joibel pushed a commit to pipekit/argo-workflows that referenced this pull request Sep 19, 2024
Joibel pushed a commit that referenced this pull request Sep 20, 2024
@philBrown
Copy link
Contributor

This seems like a breaking change released as a patch version. Not very good semver

@Joibel
Copy link
Member

Joibel commented Sep 23, 2024

I agree. I did the release of 3.5.11 but this shouldn't have been in it. It should have been marked with ! in the title to prevent it being backported as it is breaking.

@philBrown we don't follow semver but this should have waited for 3.6.

@agilgur5, @juliev0, @terrytangyuan - I'm happy to revert and release a 3.5.12 undoing this. Opinions?

@philBrown
Copy link
Contributor

FYI I had to update our code with something like

if duration, err := wf.Spec.PodGC.GetDeleteDelayDuration(); err == nil {
    //...
}

Thanks for the utility function

@agilgur5
Copy link
Author

Per the PR description, it's not breaking:

Notes to Reviewers

This fix is backward-compatible as it does not affect the CRDs (which were correct already) and only fixes the broken API piece.

The types did not change as it was always a string, except in the API, where it did not work.

@philBrown
Copy link
Contributor

it's not breaking

@agilgur5 I beg to differ...

  • I have consumer code using the Golang module
  • I updated the patch release
  • My code is now broken

@agilgur5
Copy link
Author

using the Golang module

Internal Go types are internal. Golang doesn't have a way of defining "published" or "not published" given you can use any Git repo as a source. There are no boundaries defined there: if the entire module were considered "public", then many many internal changes and refactors would be considered breaking.

The Public API is defined by schemas, such as the CRD and OpenAPI schema.
There is a Golang SDK which connects to the API and therefore follows its schema.

@agilgur5
Copy link
Author

agilgur5 commented Sep 24, 2024

Thanks for the utility function

For reference, that is an internal helper as well (it is used in the PR). As a relevant example, if we were to remove that, that should not be a breaking change, since it's internal.
However, if you are using it, that would indeed break your usage. We unfortunately don't really have a way of regulating that to disallow usage. That being said, there is also no user-facing documentation on using it either -- as it is not intended for external usage -- unlike the schemas and SDKs.

Using the types from the schema is a stable surface that can be relied on instead. And SDKs can be/are versioned separately. But the internal types are not versioned since they are internal. It would be a pretty big effort to make them able to be versioned too.

@Joibel
Copy link
Member

Joibel commented Sep 24, 2024

using the Golang module

Internal Go types are internal. Golang doesn't have a way of defining "published" or "not published" given you can use any Git repo as a source. There are no boundaries defined there: if the entire module were considered "public", then many many internal changes and refactors would be considered breaking.

For future reference, it is possible to make un-importable modules in go: https://pkg.go.dev/cmd/go#hdr-Internal_Directories

Argo Workflows hasn't done this, for better or for worse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api Argo Server API area/gc Garbage collection, such as TTLs, retentionPolicy, delays, and more area/spec Changes to the workflow specification.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

deleteDelayDuration JSON schema and validation mismatch
5 participants