Skip to content
This repository has been archived by the owner on Aug 29, 2023. It is now read-only.

Go: check dependency tags #182

Closed
Stebalien opened this issue Aug 18, 2021 · 20 comments
Closed

Go: check dependency tags #182

Stebalien opened this issue Aug 18, 2021 · 20 comments

Comments

@Stebalien
Copy link
Member

So, we occasionally merge PRs that depend on unreleased versions/branches. We absolutely should not do this as it causes builds to break.

We can fix this with a simple check: if the merge is to master and go list -m all includes an unreleased version of any package, fail.

We'll probably need to an exception for packages outside of our org, but I'm hoping we can limit that to v0.0.0-... and +incompatible versions.

Thoughts?

@mvdan
Copy link
Contributor

mvdan commented Aug 18, 2021

What exactly causes builds to fail? Pulling in a newer master commit should generally be fine. The only edge case where it causes pain in the long term is if the downstream starts using a new and unreleased API from upstream, and upstream changes/removes that API before the next stable release. But I would hope this would be a rare occurrence.

There are some cases which seem definitely wrong though, such as pulling in a commit pseudo-version and that commit is not present in upstream's master branch. For example, imagine pulling in a commit from a feature/experimental branch in upstream; that should not be happening in the downstream's master version, as that could lead to instability and diamond dependency issues.

Another similar problem we could nag about is requiring a master commit that is an ancestor to a release, with the idea that the downstream should simply upgrade to a newer release.

@mvdan
Copy link
Contributor

mvdan commented Aug 18, 2021

As to the dagstore PR you created:

  • They were using a slightly older go-car/v2 master commit because we shipped a couple of fixes last week, and did the release with them the day after. I imagine they did the upgrade between those two points in time.
  • Similar story with go-multicodec; it had some new features added since spring, but no proper new release until last week.

@mvdan
Copy link
Contributor

mvdan commented Aug 18, 2021

Another idea is to only do these stricter "unstable upstream" checks when the downstream tags a release, e.g. via #158.

I think dagstore master using a go-car/v2 master is fine, but a tag like dagstore v0.5.0 shouldn't be using a go-car/v2 master commit.

@raulk
Copy link

raulk commented Aug 18, 2021

My 2c:

Downstream master depending on unreleased commits that are also in upstream master branches is totally fine.

Referencing an upstream tag from master does not provide much advantage over referencing an upstream master commit. They both share the immutability guarantee: just like we expect maintainers to not force push a tag, we expect maintainers to not force push master.

The only mild benefit of referencing a tag over a commit is that it provides some semantic context as to what's the nature/breakage of the changes we're dragging. But that benefit is nullified by the fact that developers depending on that downstream's master are already bleeding-edge-minded, and inherently tolerate breakages (otherwise they should not depend on master!).

Thus, I would -1 the suggestion here, as it will detract velocity and make evolution a pain.

@warpfork
Copy link
Contributor

warpfork commented Aug 19, 2021

I agree with the other comments that a strict red/green rule about this would probably create a lot of problems, but also agree something to help consolidate what version of things we use overall would probably accelerate and improve our development.

(Another example for a strict rule in PRs being problematic: During the ipld-prime-in-ipfs saga, we actually had stacks of repos -- approximately a dozen -- with PRs in each of them, targeting master in each repo, which included bumps to specific commits in downstreams, as we tested and co-developed things forward. We saved up all the tagging for the end, and did a cascade of tagging and merging. The ultimate aim (everything on master uses a tagged release) was satisfied. But there's no way we could've made it through the development cycle without having commit hashed dependency references used during the PRs to get there. And if CI regularly gave a "red light" throughout that development cycle, even on a sub-check, it would've wasted a lot of attention.)

I think I'd like to see us invest in some tool that draws module dependency graphs (including versions), and then use that visualization to help us minimize the sprawl of that graph, and keep it moving ratcheting forward.

I drafted something like that here: https://github.com/warpfork/go-depchart/ . (Check out the graphic in the readme.) While that's not even remotely productionized (and I'm not promising the time budget to do so), I think this is the right direction to go. Advisory, informational tools, which let us draw a picture that both lets us clearly communicate why the sprawl is a complexity problem, and, also gives us the navigational information needed to do something about it in the trenches.

@marten-seemann
Copy link
Contributor

Not voicing any opinion on this issue, just wanted to throw in that now that we migrated everything to GitHub Actions, it is really easy to have the bot comment on issues and PRs. We should make use of that in situations where a red / green check doesn't fit.

@Stebalien
Copy link
Member Author

If you make a module depend on a master commit of some other module, either:

  1. You make sure to update the dependency to a released version before you cut a release. If you are not directly maintaining that code, you're making trouble for whoever does. Worse, there's no (current) way to enforce this kind of thing in CI.
  2. The commit you depend on becomes a de facto release. Anyone updating to the latest version of your module needs to depend on it.

Given this, why not just tag a release? We're currently living in world 2, not world 1, and I'm quite confident that we're incapable of reliably implementing 1.

Tagging a release:

  1. Takes almost no time.
  2. Forces one to consider "is this ready". If it's not ready, we shouldn't be pulling it in as a dependency. That just makes the dependent module unreleasable.
  3. Forces one to consider whether or not the release contains braking changes when picking a version number.
  4. Avoids forcing downstream packages to pull in unreleased transitive dependencies (avoids world 2).

The only mild benefit of referencing a tag over a commit is that it provides some semantic context as to what's the nature/breakage of the changes we're dragging.

I cannot express my disagreement with this statement strongly enough. Taking a moment to consider whether or not something is ready and/or whether or not something breaks something else is always worth it. And it saves a ton of grief down the line when someone else assumes there was no breaking change, pulls in an update, releases, then manages to break their downstream users.

@Stebalien
Copy link
Member Author

Ok, so, six months on and:

  1. We've been following the rule proposed here anyways, as far as I can tell.
  2. I haven't seen any complaints..
  3. We have no automation to enforce it, which leads to mistakes.

Proposal: Report an error if the following is non-empty.

go list -m all | grep 'github.com/\(filecoin-project\|ipfs\|libp2p\|multiformats\|ipld\)/' | grep -v ' v0.0.0' | grep -- ' v.*-0.'

@mvdan
Copy link
Contributor

mvdan commented Jan 12, 2022

I don't have a problem with enforcing this kind of check, but would it run for every push to master, or just when we're about to tag a release? Now that unified CI can run extra checks before a git tag is pushed, we could add this check there.

I'd strongly prefer that over enforcing this on master and all PRs. Keeping master in a "releaseable" state is good, but tagging absolutely every pulled upstream change will likely lead to too much process and the opposite of what we want - developers turning off the CI checks to work around the red tape :)

Also, I think your regular expression might need tweaking. A pre-release can contain the string v0.0.0, if there hasn't been a tag in that repository yet; for instance: bazil.org/fuse v0.0.0-20200117225306-7b5117fecadc.

https://go.dev/ref/mod#pseudo-versions has more examples that would likely trip up your regex, like vX.Y.Z-pre.0.yyyymmddhhmmss-abcdefabcdef - note that .0. is used rather than -0..

Luckily for you, https://pkg.go.dev/golang.org/x/mod/module#IsPseudoVersion already implements this check, so we can either use the Go API directly in some way, or copy-paste its regular expression:

var pseudoVersionRE = lazyregexp.New(`^v[0-9]+\.(0\.0-|\d+\.\d+-([^+]*\.)?0\.)\d{14}-[A-Za-z0-9]+(\+[0-9A-Za-z-]+(\.[0-9A-Za-z-]+)*)?$`)

@mvdan
Copy link
Contributor

mvdan commented Jan 12, 2022

Also note that, if we went with Go code, that module also has a mod file parser: https://pkg.go.dev/golang.org/x/[email protected]/modfile#Parse

Probably unnecessary, as shell is likely enough, but still FYI. There's also go mod edit -json to print the "AST" form of the go.mod file, but I don't think it really helps bash, given it doesn't really speak JSON.

@Stebalien
Copy link
Member Author

Keeping master in a "releaseable" state is good, but tagging absolutely every pulled upstream change will likely lead to too much process and the opposite of what we want - developers turning off the CI checks to work around the red tape :)

I've never had this issue so I'm having quite a bit of difficulty relating to your position.

  1. If I'm working on an unreleasable feature, I try to keep it in a feature branch. Otherwise, I make master unreleasable.
  2. If master is always releasable, tagging a release is usually really easy.
  3. We can always tag per-releases.

Can we just try this see if it's actually a problem, then discuss the problem to figure out what's actually causing it?

@aschmahmann
Copy link
Contributor

Keeping master in a "releaseable" state is good, but tagging absolutely every pulled upstream change will likely lead to too much process and the opposite of what we want - developers turning off the CI checks to work around the red tape :)

To add onto this. If you're in this situation repeatedly it seems like you have tightly coupled repos that should just be merged together. Do we have any examples of this problem we can look at?

Another example for a strict rule in PRs being problematic: During the ipld-prime-in-ipfs saga, we actually had stacks of repos

This example IMO not only doesn't support not tagging, but actively supports the opposite. We waited on merging into master in go-ipfs until everything was tagged. Additionally, most of the cross repo PRs and stacks of commit hashes stemmed from there being just too many repos that could've been combined.

I don't have a problem with enforcing this kind of check, but would it run for every push to master, or just when we're about to tag a release? Now that unified CI can run extra checks before a git tag is pushed, we could add this check there.

I'd like to keep master tagged given:

  1. We've been following the rule proposed here anyways, as far as I can tell.
  2. I haven't seen any complaints..

However, whether we do it in PRs or after the fact on commits to master that violate the policy is 🤷. I'd probably rather do them in PRs unless it'll get in the way of things like extra build steps for docker containers that only would happen if the tests were otherwise green.

@marten-seemann
Copy link
Contributor

To add onto this. If you're in this situation repeatedly it seems like you have tightly coupled repos that should just be merged together. Do we have any examples of this problem we can look at?

go-libp2p is currently in a state where it is referencing a bunch of master commits of other libp2p repositories. The reason for this is the following: We're making 2 breaking changes in -core, both of which require us to update a few other repositories. If our policy was to only allow tagged releases, we'd end up with two -core releases (and two releases of a number of other libp2p repos).
I was trying to avoid cutting releases which would never make it into a go-libp2p release.

@vyzo
Copy link
Contributor

vyzo commented Jan 12, 2022

I think tis fine to have master referencing commits in master of other repos, it is very helpful with developing in our repo labyrinth.

Releases should only ref tagged releases however, and thats something we could check in the release workflow.

@marten-seemann
Copy link
Contributor

If we decide that it's ok to tag master releases during development, we could:

  1. Have a workflow (running on PRs) that checks that commits only target commits that are master commits (and not on other branches). This should probably not be a check that fails the build, as we'll quite often reference non-master commits. We could automatically post a comment on the PR though.
  2. Have a workflow that runs on release PRs that checks that we only reference released versions.

@Stebalien
Copy link
Member Author

I'd be OK with a "is on master" check, but I'm not sure how to do it efficiently.

But still, I have to challenge this: why not make the go-libp2p changes on feature branches? I.e.:

  1. Create a long-running feature branch for some long-running endeavor.
  2. Merge PRs into this feature branch.
  3. Merge the feature branch when it's ready.

There are definitely cases where this doesn't work (where this would introduce conflicts).

@marten-seemann
Copy link
Contributor

  1. Create a long-running feature branch for some long-running endeavor.

Creating a feature branch works in some situations, but can get really annoying if it's too long-running. You might create a lot of merge conflicts if other PRs are merged into master in the mean time.
It also doesn't really work if you're working on multiple features in parallel. Sure, you could merge all of them into the feature branch, but what you've now done is effectively create a new default branch (with the additional downside of creating merge conflicts).

@warpfork
Copy link
Contributor

warpfork commented Jan 13, 2022

2cents / present anecdata: I just made a tag in a repo because someone asked for it because a downstream wanted a tag For Reasons.

The tag has, in my opinion, absolutely no semantic value other than "someone was capable of bothering me to tag this, and I was willing to accede (which turns out to signal that the content was on master)".

As far as wastes of time go, this is minor, because I've already decided that I'm not even bothering with changelogs on v_._.+1 tags (mostly because of... this). But I'd have a very hard time defending that this was a good or useful use of time nor that it was worth playing telephone and having a synchronous (potentially blocking) exchange with someone in order for them to get this tag.

@mvdan
Copy link
Contributor

mvdan commented Jan 13, 2022

I'd be OK with a "is on master" check, but I'm not sure how to do it efficiently.

It can be done on GitHub with exactly one API call, which I think is as efficient as it can get without cloning the repository:

$ REPO=libp2p/go-libp2p
$ MASTER_COMMIT=978cc42d1e
$ OTHER_COMMIT=b14e606985
$ curl -s https://api.github.com/repos/${REPO}/compare/HEAD...${MASTER_COMMIT} | jq -r .status
behind
$ curl -s https://api.github.com/repos/${REPO}/compare/HEAD...${OTHER_COMMIT} | jq -r .status
diverged

If the string is identical or behind, then it's contained in the HEAD branch, e.g. master or main.

Getting the commit hash should be doable by tweaking the regular expression above slightly so it can be used with sed.

@galargh
Copy link
Contributor

galargh commented Aug 28, 2023

This issue was transferred to ipdxco/unified-github-workflows#36 in preparation for archiving of this repository.

@galargh galargh closed this as completed Aug 28, 2023
@github-project-automation github-project-automation bot moved this from 🥺 Backlog to 🥳 Done in InterPlanetary Developer Experience Aug 28, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants