Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADR14: ArgoCD as Candidate for a Continous Deployment System #1889

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

tarrow
Copy link
Contributor

@tarrow tarrow commented Dec 7, 2024

Bug: T377082

@deer-wmde
Copy link
Contributor

deer-wmde commented Dec 11, 2024

Thanks for the writeup, I generally agree with the text and will add some remarks here.

  1. on values files: it might be a good idea to pause further argo migrations until we have figures this out. Part of this is I think the buy-in to the templating functionalities that helmfile provides but are really hard to replicate in other settings, so a refactoring stretch goal could be to get rid of the templating and maybe start using kustomize (which is a better supported overlay system than the go templating helmfile uses; for example, kubectl supports it natively as well as argocd).

  2. as long as 1. is not solved, I suggest we do not use argocd locally as in the current situation it increases the overhead for development way too much (building an image with skaffold works perfectly fine but trying to test new charts and/or chart values locally got more complicated than with helmfile)

  3. I'd love to get some feedback on what people think about the current experience with deploying for UI and API. Deployment with helmfile is a bit clunky but you also get feedback in the form of terminal output. But now with argoCD you basically have to look the status up yourself in the web ui or cli. I have two suggestions:

  • a) we could think about adding automatic mattermost notifications for deployment events, could be especially useful in case of errors we otherwise might miss
  • b) on syncing speed: we could probably lower the time argo waits to check for changes or even trigger syncing with a github webhook after merge

@deer-wmde
Copy link
Contributor

Another thought: if the team is fine with skipping the attempt to keep the configuration backwards compatible (which I get the sense of with how we added redis-2) I also suggest to just use all argo controlled deployments like this and get rid of the values script workaround

@tarrow
Copy link
Contributor Author

tarrow commented Dec 13, 2024

  1. on values files: it might be a good idea to pause further argo migrations until we have figures this out. Part of this is I think the buy-in to the templating functionalities that helmfile provides but are really hard to replicate in other settings, so a refactoring stretch goal could be to get rid of the templating and maybe start using kustomize (which is a better supported overlay system than the go templating helmfile uses; for example, kubectl supports it natively as well as argocd).

Yeah, I agree I hope it was clear that we pause further migrations until we make this smoother with the current services. If it's not clear from the text I wrote could you suggest an alteration to make it clearer?

  1. as long as 1. is not solved, I suggest we do not use argocd locally as in the current situation it increases the overhead for development way too much (building an image with skaffold works perfectly fine but trying to test new charts and/or chart values locally got more complicated than with helmfile)

I'm unkeen to fragment the deployment system further; that's why I didn't suggest rolling this back locally. I think we could document the ways to test stuff locally much more clearly though and ideally find some solution that feels a lot more gitops native as test solution. I suspect this means making the local clone of the git repo visible in the cluster to argo.

  1. I'd love to get some feedback on what people think about the current experience with deploying for UI and API. Deployment with helmfile is a bit clunky but you also get feedback in the form of terminal output. But now with argoCD you basically have to look the status up yourself in the web ui or cli. I have two suggestions:
  • a) we could think about adding automatic mattermost notifications for deployment events, could be especially useful in case of errors we otherwise might miss
  • b) on syncing speed: we could probably lower the time argo waits to check for changes or even trigger syncing with a github action after merge

Yeah, for me it's pretty perfect for UI and api now. I like it a lot.

a) yeah, that could be nice. We can also consider having prometheus metrics exported and add alerts on them
b) If people are annoyed by the speed (I'm actually not) then the correct native feature to use is https://argo-cd.readthedocs.io/en/stable/operator-manual/webhook/ which is for exactly this case

@tarrow
Copy link
Contributor Author

tarrow commented Dec 19, 2024

I'd really like to try and have this closed and merged in a somewhat timely manner; ideally we could collect all feedback and make any alterations to the ADR text to merge by Jan 10th.

I was particularly wondering if @deer-wmde and @AndrewKostka wanted add/adjust anything as people who'd already expressed opinions/worked on the topic

Comment on lines +31 to +34
- Local development and testing of ArgoCD deployed services is less trivial than with helmfile
- Releasing and using new helm charts for services we manage has caused engineers higher friction than helmfile
- Deploying new services required some onboarding even for senior engineers
- Generation of values files for the migrated redis service was confusing to engineers who didn't build this system
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be included in the negative consequences section.


We have experienced:
- Deploying new images of existing services has worked smoothly and been relatively transparent to engineers
- A sustained low cycle time and increased number of deployments for the migrated services
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this was more correlation than causation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had an increase in UI deployments because we had a bunch of smaller UI tasks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants