Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cached git cloner #3655

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

maknihamdi
Copy link

@maknihamdi maknihamdi commented Mar 1, 2021

To improve kustomize build time, I add a new gitcloner function: CachedGitCloner.
This function uses a home directory to cache "visited" repositories.
In the same build cycle we can reuse an already cloned repository with the same ref.
We also can reuse the same directory between builds.
User can clean manually his cache

The CachedGitCloner check if the git ref is a tag or a branch, the command fails if no tag are used.
We can discuss about and improve this check. The check can be optional, using a build flag, we decide to be strict or flexible with git ref. (this is not implemented, the only strict mode is implemented)

this PR can fix #2460

@k8s-ci-robot
Copy link
Contributor

Welcome @maknihamdi!

It looks like this is your first PR to kubernetes-sigs/kustomize 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/kustomize has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 1, 2021
@k8s-ci-robot
Copy link
Contributor

Hi @maknihamdi. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 1, 2021
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 2, 2021
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 5, 2021
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 6, 2021
Copy link
Contributor

@monopole monopole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for trying this.

This PR completely replaces how kustomize works with git, and introduces new set of cache mgmt issues.

  • The code being replaced isn't large, but it has undergone many changes by many users (see history), replacing it will likely hold surprises in some use cases.

  • There's no flag to allowing a user to try the new method and fall back to the old method. This has to be behind some flag, e.g. --enable_alpha_git_cache.

  • Caches always start as a performance enhancement, then become a source of bugs around cache expiration, cleaning, etc. If kustomuize is going to maintain a cache, it should be able to list, refresh and purge that cache (new commands).

  • This PR adds an undocumented .kustomize directory, and it doesn't use or honor $XDG_CONFIG_HOME/kustomize. Any artifacts created by kustomize should live there (so far it's only plugins).
    See https://kubectl.docs.kubernetes.io/guides/extending_kustomize/#placement.

  • finally, it's likely that the popular go-getter from hashicorp is going to come back into use in some injectable form that doesn't increase the director or transitive dependencies of kustomize/api. That's not enough to stop a caching proposal, but it's all going to have to work together somehow.

Finally. a meta question:

Does it make sense to do this?

Do we want kustomize to be maintaining repository caches - as opposed, to - say - some wrapper ci/cd system handling this?

A flag would at least allow people to experiment. This is a big change.

@maknihamdi
Copy link
Author

maknihamdi commented Mar 7, 2021

thank you @monopole for your review!

It's a good idea to introduce a --enable_alpha_git_cache flag to let people experiment. It's what we do with my team, we started with a very basic implementation, and we are improving the process to have something usable with many use cases.

I think we can introduce an extra flag to accept or not a mutable dependency (branches ref). For example, in our CI/CD we prohibit using branches reference in production environments, but we need it in development env

We can also introduce expiration management, with an automatic cleanup or manual assist, it can be a new kustomize command (or flag) to clean cache directory.

I got inspired by kubectl cache directory to choose the $home/.kustomize (I should document it by the way), but we can change it to use the XDG_CONFIG_HOME, but it's not a plugin, and It's not a conf, I dont know if XDG_CONFIG_HOME is the good place. What do you think?

"Does it make sense to do this?": we really saves a lot of time in development and ci cd jobs since we use this functionality with my team. I'll explain how and why we do it in the issue #2460

@maknihamdi
Copy link
Author

what about XDG_CACHE_HOME to store the cache? It s maybe more compliant as XDG_CONFIG_HOME

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 27, 2021
@monopole
Copy link
Contributor

We should schedule time to discuss this at a sig-cli meeting.
Would you be willing to add it to an agenda and come talk about it?

@monopole
Copy link
Contributor

Agree XDG_CACHE_HOME should be honored. It should default to $HOME/.cache

Could put that definition near https://github.com/kubernetes-sigs/kustomize/blob/master/api/konfig/general.go#L29

Then kustomize's git cache would be in, say, $XDG_CACHE_HOME/kustomize/git.

@rdubya16
Copy link

We would really like to see this get in as well. We are struggling with this same problem where we version components against a remote base repo and very long build times.

@maknihamdi maknihamdi force-pushed the cached-git-cloner branch from 308379d to 44fea5c Compare May 18, 2021 15:09
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 18, 2021
@monopole
Copy link
Contributor

Please rebase and squash so we can see where this is.

@monopole
Copy link
Contributor

The idea to add an allow branch refs flags is good and consistent with flags having no impact on kustomize output.

@maknihamdi maknihamdi force-pushed the cached-git-cloner branch from 44fea5c to a801599 Compare May 19, 2021 09:43
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels May 19, 2021
@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 5, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 5, 2023
@george-angel
Copy link
Contributor

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 6, 2023
@cailyn-codes
Copy link
Contributor

@maknihamdi are you still planning on working on this? If so I think it could be worth a KEP, or at least detailed issue so that a suitable strategy could be made.

@george-angel - Are you reopening because you would like to see this work merged?

@rdubya16
Copy link

@george-angel - Are you reopening because you would like to see this work merged?

We would also like to see this work merged, it would go a long way to speeding up our kustomize builds.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 21, 2023
@george-angel
Copy link
Contributor

/remove-lifecycle stale

@cailynse Correct - very keen to see this implemented.

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 22, 2023
@annasong20
Copy link
Contributor

@sarab97 @natasha41575 I don't think localize directly solves the same problem as this cached git cloner.

As @sarab97 alluded to in his Slack message, the localize command operates on the Kustomization level. I believe it only yields performance benefits if one localizes a Kustomization, then repeatedly builds from it or references it in a Kustomization overlay. Each additional build is free in terms of git clones.

The cached git cloner, on the other hand, operates at the remote root aka repository level. This means that localize is inferior in all cases where repositories are re-used, but not the exact Kustomization. Examples include a large Kustomization that contains multiple references to the same repository or different Kustomizations that all reference the same repository. This is especially true when the user only wants to build once on each such Kustomization. I failed to consider performance costs when implementing localize; my only motivation was to allow users to build without internet access...

Other small differences between this cached git cloner and localize include

  • localize is unable to handle remote values in certain fields in alpha
  • localize saves only files in the remote root that the Kustomization references

@natasha41575
Copy link
Contributor

natasha41575 commented Oct 25, 2023

I've been thinking about this PR a lot, and I want to start off by saying that I agree that we should try to improve the build times for remote builds. However, this approach violates the fundamental kustomize design philosophy of "no build time side effects", which you can read about here. I will do my best to explain why, and offer some other approaches that might resolve the issue.

Kustomize is designed in such a way that a kustomization directory is supposed to have all the information it needs for the build. This means that it shouldn't read from the environment, and it shouldn't read from files outside the kustomization directory. This keeps a kustomization directory simple, consistent, and portable, e.g. the output of kustomize build is stable and predictable from the kustomization directory alone. Whether or not you agree with this design principal, this is what kustomize is founded on and built around. This PR violates that principal (a) by reading from an environment variable and (b) relying on a directory outside the kustomization directory (the cached one) that may or may not be representative of the URLs provided in the kustomization at the time.

Beyond violating the build time side effects, I think this also complicates the workflow as it leaves it up to the user to refresh the cache and it is not immediately obvious what kustomize is using as the source of truth in its build, making the tool less intuitive. I believe there are better solutions here that can equally resolve the issue or at least come close. Some potential ideas might be:

a) To improve the build time within a single build, we can consider caching the remote directory (i.e. what this PR does), except that we need to clean up the cached directory before finishing the build. That means that the cached directory cannot be shared across builds. I would also vote for it to be in a hardcoded location, rather than using an environment variable.
b) We can improve some workflows around kustomize localize to achieve similar affects to this PR. For example, one workflow might be to kustomize localize a remote directory, and then do something like kustomize edit replace dir1 dir2 ... etc to replace URLs with the path to the localized copy. We can consider further enhancing localize and kustomize edit to make this really easy. I prefer this strongly over option (a).

I'm happy to keep discussing and brainstorming other solutions, but I want to emphasize that we have to keep the original kustomize design principals in mind. I know this can be frustrating as it seems to block important improvements, but I think it is really important that we stick to these principles. One of the old maintainers wrote up a wonderful writeup which I will paste here that really explains well why we need to stick to them:

Our guiding principles aren't just dogma for its own sake–they're promises we make to our users about how Kustomize works. They define what distinguishes it among the vast array of options in the configuration management landscape, and they help us keep making a great tool with a consistent vision and user experience across time and leadership changes. We regularly get feature requests to reverse every single one of these principles, and I'm sure if we accommodated them, many users would find those changes useful as well. However, it would likely break workflows for those taking advantage of the distinguishing features Kustomize was designed to provide, and it would certainly degrade Kustomize's user experience in the long run.

In this case, we're mostly talking about the "no build side-effects" principle, which is a promise to our users regarding the stability of the output of kustomize build specifically. This promise relates to the workflow Kustomize is optimized for, that is, one where the best practice of storing one's entire configuration in version control is followed. We know that not everyone who uses Kustomize follows this practice, and indeed Kustomize provides multiple workarounds that can be used to bridge the gap. In this case, kustomize edit is one, and plugins are another.

@george-angel
Copy link
Contributor

Thanks @natasha41575

Kustomize is designed in such a way that a kustomization directory is supposed to have all the information it needs for the build. This means that it shouldn't read from the environment, and it shouldn't read from files outside the kustomization directory.

Am I wrong in thinking that kustomize build actually creates directories in /tmp/ like /tmp/kustomize-2494344589/ which I presume it reads from?

I do like option (a) - that goes a long way to solve our issues of monorepo bases that we reference many times during a single build. My very naive view was - to keep git cloned repos in /tmp/ for the duration of the build, which sounds like what you described.

Option (b) is fine, but it would need our input to make use of it. We are currently running our own CD - https://github.com/utilitywarehouse/kube-applier/ and looking to migrate to ArgoCD, where we have to do Kustomize builds via a plugin: https://github.com/utilitywarehouse/argocd-voodoobox-plugin .

So the logic described has to be implemented there. And for other orgs - they would rely on ArgoCD, Flux, etc. to make use the new localize flow. This is not a problem, but certainly not as easy as "if your kustomize build has 100 references of the same branch of a remote repo, it will only be cloned once".

Thank you!

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 28, 2024
@george-angel
Copy link
Contributor

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 30, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 29, 2024
@therealdwright
Copy link

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 29, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 28, 2024
@therealdwright
Copy link

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 28, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 26, 2024
@george-angel
Copy link
Contributor

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve build times for remote layers