Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

App of apps sync-waves not working in 1.8.1 #5146

Closed
michaelajr opened this issue Dec 29, 2020 · 35 comments
Closed

App of apps sync-waves not working in 1.8.1 #5146

michaelajr opened this issue Dec 29, 2020 · 35 comments
Assignees
Labels
bug Something isn't working cherry-pick/1.8 Candidate for cherry picking into the 1.8 release branch
Milestone

Comments

@michaelajr
Copy link

michaelajr commented Dec 29, 2020

Describe the bug

Using 1.8.1 - app of app sync waves are not working - all the apps sync at once. If I delete them, and sync again, sync waves then work. Can delete/re-sync, several times after, and each time sync-waves work. But if I make a new commit to the target brach, delete all the apps and sync - same thing happens - all the apps sync at once. Delete them, and re-sync - then sync waves work. Seems there is an issue honoring sync-waves when the repo is first cloned and/or refreshed.

I am using a mono repo and have the new path annotation on each app. Auto sync is on for all the child apps, and I manually issue the sync command for the bootstrap app.

I have added the required resource.customizations block to the argocd config map.

  resource.customizations: |
    argoproj.io/Application:
      health.lua: |
        hs = {}
        hs.status = "Healthy"
        hs.message = ""
        if obj.status ~= nil then
          if obj.status.health ~= nil then
            hs.status = obj.status.health.status
            hs.message = obj.status.health.message
          end
        end
        return hs

To Reproduce

  • Create an app of apps.
  • Add child apps with auto sync on and place them in different waves using the sync-wave annotation.
  • Issue argocd app create on the bootstrap manifest. You will see the bootstrap app get created.
  • Then argocd app sync on the bootstrap app. You will see the bootstrap app sync, and then all the child apps will begin to sync without waiting for the previous wave to sync first and become healthy.
  • Do an argocd app delete on the bootstrap app (or just do it all in the UI) and wait for the everything to get removed).
  • Then do the the steps again and sync waves will work.

Expected behavior

Sync waves should wrk the first time around

Version

$ argocd version
argocd: v1.8.1+c2547dc.dirty
  BuildDate: 2020-12-10T04:44:20Z
  GitCommit: c2547dca95437fdbb4d1e984b0592e6b9110d37f
  GitTreeState: dirty
  GoVersion: go1.15.5
  Compiler: gc

  Platform: darwin/amd64
argocd-server: v1.8.1+c2547dc
  BuildDate: 2020-12-10T02:59:21Z
  GitCommit: c2547dca95437fdbb4d1e984b0592e6b9110d37f
  GitTreeState: clean
  GoVersion: go1.14.12
  Compiler: gc
  Platform: linux/amd64
  Ksonnet Version: v0.13.1
  Kustomize Version: v3.8.1 2020-07-16T00:58:46Z
  Helm Version: v3.4.1+gc4e7485
  Kubectl Version: v1.17.8
  Jsonnet Version: v0.17.0
@michaelajr michaelajr added the bug Something isn't working label Dec 29, 2020
@michaelajr
Copy link
Author

So I've circled back around on this, and sync-waves are definitely broken in an app-of-apps. Everything syncs at once. Don't want to downgrade, so if anyone can get this working, please share. Thanks.

@michaelajr michaelajr changed the title App of apps sync-waves not honored on first sync - and/or after a refresh of a new git commit App of apps sync-waves not working in 1.8.1 Jan 5, 2021
@devonhk
Copy link

devonhk commented Jan 12, 2021

Sync-waves have also broken for us after upgrading to 1.8.1. However our setup is slightly different:

  • no app of apps
  • autosync disabled

All the outofsync resources in the screenshot had sync-waves set with negative numbers so they would get applied before the other resources.
Screen Shot 2021-01-12 at 10 19 14 AM

@irizzant
Copy link
Contributor

it's also broken in the master version

@irizzant
Copy link
Contributor

After further tests, I found that version 1.7.11 works as expected.

Versions 1.8.0 and later don't.

@michaelajr
Copy link
Author

Yeah - we had to downgrade. Which is really too bad as 1.8 has some nice features we want.

@irizzant
Copy link
Contributor

Hopefully this is going to be fixed, I'd need some of the 1.8 features too

@alexmt alexmt self-assigned this Jan 19, 2021
@alexmt alexmt added this to the v1.9 milestone Jan 19, 2021
@alexmt alexmt added the cherry-pick/1.8 Candidate for cherry picking into the 1.8 release branch label Jan 19, 2021
alexmt added a commit to alexmt/argocd-example-apps that referenced this issue Jan 19, 2021
@alexmt
Copy link
Collaborator

alexmt commented Jan 19, 2021

Hello everyone,

Tried to reproduce the bug using app-of-apps and just regular resources with sync-waves, no luck, unfortunately. Everything looks normal. I think I'm missing something. Can someone help me to replicate the issue?

Here is what I've got so far: https://github.com/alexmt/argocd-example-apps/tree/5146/5146

App-of-apps application creates two child apps app1 and app2. The app1 has sync wave -1.

argocd app create apps \
--repo https://github.com/alexmt/argocd-example-apps.git \
--revision 5146 \
--path 5146 \
--dest-server https://kubernetes.default.svc \
--dest-namespace argocd

Tried to sync parent app first, then deleted child apps, committed a change to repo and synced parent app again. In both cases app1 was synced first.

@michaelajr
Copy link
Author

@alexmt I have auto-sync off on the parent app, and auto-sync on for the child apps. There are about 15 child apps in 5 waves.

  • argocd app create the parent app
  • argocd app sync the parent app

All the child apps syncing at the same time instead of using waves.

I can get you more detail later tonight. Thanks for looking into this!

@alexmt
Copy link
Collaborator

alexmt commented Jan 19, 2021

Thank you @michaelajr! Trying to increase the number of apps in my sample app.

@irizzant
Copy link
Contributor

In my case I have autosync on at all levels (parent-children), don't know if this has an impact or not.
All the apps start at the same time

@michaelajr
Copy link
Author

@alexmt doing more testing... still seeing child apps sync at once. They seem to appear in waves... but they do not wait until the previous wave is done before starting their sync. So one wave appears - starts to sync - and then a few seconds later - the next wave appears without the previous wave completing first.

@michaelajr
Copy link
Author

michaelajr commented Jan 20, 2021

@alexmt we also installed using the helm chart. I had multiple application controllers (3) - but not sure we need them - so I am now running 1 - but see the same issue regardless. I also have the new path annotation on each child Application.

❯ kubectl get pods -n argocd
NAME                                       READY   STATUS    RESTARTS   AGE
argocd-application-controller-0            1/1     Running   0          37m
argocd-dex-server-7789589d7f-8kqct         1/1     Running   0          57m
argocd-redis-ha-haproxy-85dc485cc4-4c8rs   1/1     Running   1          18h
argocd-redis-ha-haproxy-85dc485cc4-58thj   1/1     Running   1          18h
argocd-redis-ha-haproxy-85dc485cc4-w4d4h   1/1     Running   1          18h
argocd-redis-ha-server-0                   3/3     Running   3          18h
argocd-redis-ha-server-1                   3/3     Running   3          18h
argocd-redis-ha-server-2                   3/3     Running   3          18h
argocd-repo-server-7c6fbf6685-27wqj        1/1     Running   0          57m
argocd-repo-server-7c6fbf6685-69hfg        1/1     Running   0          55m
argocd-repo-server-7c6fbf6685-gdxgf        1/1     Running   0          57m
argocd-repo-server-7c6fbf6685-m5jmf        1/1     Running   0          57m
argocd-repo-server-7c6fbf6685-qqfmp        1/1     Running   0          55m
argocd-repo-server-7c6fbf6685-stnpc        1/1     Running   0          55m
argocd-repo-server-7c6fbf6685-v5mf6        1/1     Running   0          57m
argocd-server-6f57bb48f-cfqlf              1/1     Running   0          57m
argocd-server-6f57bb48f-wxlqw              1/1     Running   0          55m
argocd-server-6f57bb48f-z4drt              1/1     Running   0          57m

@michaelajr
Copy link
Author

@alexmt to test - I have it down to 3 waves with 1 app each. When I sync the bootstrap app, all the child apps appear at or around the same time. The app in wave 1 starts, then the app in wave 3 starts (before the one in wave 1 finishes), and then the one in wave 2 starts. Definitely not working.

@michaelajr
Copy link
Author

michaelajr commented Jan 20, 2021

@alexmt oddly enough - deleting the bootstrap app - removes the child apps in reverse order as expected.

@michaelajr
Copy link
Author

michaelajr commented Jan 21, 2021

Ok - still not working - but adding the resource.customizations described in the OP gets it back the behavior described in the OP. Order is not honored on the first sync. But if I delete the bootstrap app (the child apps get deleted and then the bootstrap gets deleted as expected), and re-create the bootstrap app and sync... sync-waves work.

Note that this is seen in a brand new argocd installation - and seen when the first app created and synced in the new installation... is an app of apps.

Is it possible something lazily loaded and gets missed on the first app create/sync?

@alexmt
Copy link
Collaborator

alexmt commented Jan 21, 2021

I'm sorry. Got distracted by meetings and did not have time to work on it. Keep trying to reproduce the issue today.

@luddskunk
Copy link

I can join in and say that this does not work for me either as of today. Version 1.8.2

Thanks for acting so swiftly @alexmt!

@jannfis
Copy link
Member

jannfis commented Jan 22, 2021

I'm also trying to reproduce this issue. Fresh installation of Argo CD v1.8.3, fresh K8s cluster. The only thing I modified is adding the health customization to argocd-cm and created two namespaces for the child apps guestbook-one and guestbook-two.

I then went to the UI and created an app that spawns the parent app, from a test repository at https://github.com/jannfis/app-of-apps with name bootstrap and a path of parent, destination namespace argocd. The parent app has sync wave value of -1 and spawns two other apps, guestbook-one in sync-wave 1 and guestbook-two in sync-wave 2. All three applications (parent, guestbook-one and guestbook-two) have auto-sync turned to.

When I manually trigger the sync of bootstrap app, parent gets spawned and gets synced. This spawns guestbook-one first and starts it sync. Once it has synced and became healthy, guestbook-two gets spawned, syncs and becomes healthy.

This is basically the reproduction steps in the OP, but work for me as expected. Is there something I have overseen?

@michaelajr
Copy link
Author

michaelajr commented Jan 23, 2021

@jannfis After looking at your example - I think I got it to work. I noticed you set a sync-wave of -1 on the parent/bootstrap app. I never had to do that in previous releases - I just annotated the child apps. But after adding the annotation to my bootstrap Application - I THINK it is working. Need to test a bit more - but initial sync worked.

Note that my parent/bootstrap app is a Helm chart of other Application resources - that are also Helm charts in the same repo (mono-repo pattern you guys talk about in the release notes):

addons
|-- bootstrap
|-- cluster-autoscaler
|-- datadog
|-- dex
|-- dex-k8s-authenticator
|-- external-dns
|-- ingress-nginx
|-- keda
|-- kiam
|-- kube-state-metrics
|-- log-router
|-- metrics-server
...etc..etc...etc

Going to test more. Will let you know. Thanks.

M

@michaelajr
Copy link
Author

@alexmt @jannfis That did it. Tested a few times now. Is this something new? Did I miss documentation? If so my apologies. Without the annotation on the bootstrap app the behavior I saw (not working on initial sync, but working on subsequent syncs when there were not new commits) makes sense I guess, but it was not needed in 1.7.x.

@luddskunk @irizzant @pydo - Does this fix your issues as well?

mcanevet added a commit to mcanevet/devops-stack that referenced this issue Jan 23, 2021
mcanevet added a commit to camptocamp/devops-stack that referenced this issue Jan 23, 2021
@jannfis
Copy link
Member

jannfis commented Jan 23, 2021

Is this something new? Did I miss documentation?

No, it is not even mandatory - I just included it for it completeness. In my tests, it also works without the sync-wave of -1 on the parent application. You can test it by using the no-parent-syncwave branch as target revision for your tests, where I completely removed the sync-wave annotation from the parent.

@michaelajr
Copy link
Author

michaelajr commented Jan 23, 2021

Ok - preliminary testing shows that removing -1 also seems to work. Sigh. Which is just nuts because I have been working with this all week and have never seen it work

However returning to our production branch (using chart v2.10.0) and simply adding the health customization and upgrading the binary to v1.8.2 does NOT work. Also upgrading the chart to 2.11.0 and using a StatefulSet for the application-controller does not work. In both cases, I see the same issue as the OP. I tried adding the -1 annotation and - and that too did not work (the -1 was a red herring). So I then removed the parent and child apps, and tried syncing from my branch (same branch that worked above)... and it too did NOT work. This leads be to believe the issue might be in the the ArgoCD setup/install - i.e., something is different between the working 1.7.x install in the production branch, and the new 1.8.x install in my branch. Admittedly after a several days of debugging, my branch's install has drifted quite a bit from the production branch (new chart version, using StateFull set of 3, added resource quotas, etc)

As other people are seeing issues as well - I do believe there is a breaking change somewhere. Just hard to pin down.

pburgisser pushed a commit to pburgisser/devops-stack that referenced this issue Feb 1, 2021
@alexmt
Copy link
Collaborator

alexmt commented Mar 3, 2021

v1.8 has introduced a bug: sync tasks sorting could've been broken intermittently if app includes namespace or CRD. The bug was fixed in: https://github.com/argoproj/argo-cd/releases/tag/v1.8.5

@michaelajr , can you please try to upgrade to v1.8.5 + version ?

@irizzant
Copy link
Contributor

irizzant commented Mar 4, 2021

Didn't have a chance to try 1.8.5 yet, as soon as I can I'll leave a feedback

@michaelajr
Copy link
Author

@alexmt that is interesting. I did have a namespace in the bootstrap app. But then I moved it to a child app (along with some other "set up" manifests). So maybe that is what caused my stuff so start working. Will test soon. Thanks!

@ArieLevs
Copy link

ArieLevs commented Mar 6, 2021

@alexmt 1.8.5 does not seem to fix this issue.
I was able to reproduce this using version 1.8.7, while the sync-wave feature works OK with version 1.7.14.

@alexmt alexmt modified the milestones: v2.0, v2.1 Apr 2, 2021
@michaelajr
Copy link
Author

michaelajr commented Apr 10, 2021

@alexmt I added a namespace back to our bootstrap app, and sync-waves stopped working. All the child apps go at once again. v1.8.7. Any chance this has been addressed in v2?

@danielwindit
Copy link

This still is an issue in v1.8.x as well as V2.0.x. In my app of apps setup it only works with v1.7.x but in that version I have to work around the issue that Postsync hooks keep hanging on “waiting for hook...” (only on initial creation of the app of apps)

@joebowbeer
Copy link
Contributor

The -1 should be superfluous because the default wave is 0.

@LucasBoisserie
Copy link
Contributor

I have the same issue when migrating to 1.7.11 to 1.8.7 (same with 2.0) with app of apps and argocd app custom health check.
At parent's app creation all the child's apps are created simultaneously despite wave annotation.

When argocd create the first child app, it is directly healthy and after move back to processing but the others child apps (with different wave) have been created and start to sync theirs content.

After some digging, i found a bad initialization in lua script. By default, the status is set to Healthy but when the argocd app is created the app status is empty and skip the if block to return the default status.

I updated the default status to Progressing and the issue disapear.

The final lua script:

resource.customizations: |
    argoproj.io/Application:
      health.lua: |
        hs = {}
        hs.status = "Progressing"
        hs.message = ""
        if obj.status ~= nil then
          if obj.status.health ~= nil then
            hs.status = obj.status.health.status
            hs.message = obj.status.health.message
          end
        end
        return hs

@llavaud
Copy link

llavaud commented Jun 2, 2021

    argoproj.io/Application:
      health.lua: |
        hs = {}
        hs.status = "Progressing"
        hs.message = ""
        if obj.status ~= nil then
          if obj.status.health ~= nil then
            hs.status = obj.status.health.status
            hs.message = obj.status.health.message
          end
        end
        return hs

Using your custom lua script i got the following error:

Failed sync attempt to 85b85da5167a99d89acf14c30b35add36439b410: ComparisonError: json: cannot unmarshal array into Go value of type health.HealthStatus

@ysksuzuki
Copy link

@llavaud Try this
#5423 (comment)

@llavaud
Copy link

llavaud commented Jun 2, 2021

@llavaud Try this
#5423 (comment)

it works, thanks !

@alexmt
Copy link
Collaborator

alexmt commented Jun 3, 2021

Looks like @ymmt2005 and @LucasBoisserie fixed the issue: #6281 .

Sorry for introducing the bug :(

@alexmt
Copy link
Collaborator

alexmt commented Jun 3, 2021

Closing issue since #6281 is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cherry-pick/1.8 Candidate for cherry picking into the 1.8 release branch
Projects
None yet
Development

No branches or pull requests