Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: conditions reset does not work if the service is down at the triggering time #1585

Merged
merged 29 commits into from
Feb 3, 2022

Conversation

juliev0
Copy link
Contributor

@juliev0 juliev0 commented Feb 2, 2022

This fix uses the cron syntax specified in the Condition Reset to determine the last time the trigger should have fired prior to application start up. If any messages arrive with a timestamp prior to that time, they are acknowledged.

Copy link
Member

@whynowy whynowy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

common/cronutil_test.go Show resolved Hide resolved
eventbus/driver/nats.go Show resolved Hide resolved
eventbus/driver/nats.go Show resolved Hide resolved
@whynowy whynowy changed the title bug: Issue 1543: conditions reset does not work if the service is down at the triggering time fix: conditions reset does not work if the service is down at the triggering time Feb 3, 2022
@juliev0 juliev0 marked this pull request as ready for review February 3, 2022 01:45
@juliev0
Copy link
Contributor Author

juliev0 commented Feb 3, 2022

Here's a description of how I tested this:

  1. created a Sensor with a dependency of A && B && C and a Condition Reset and tested the following cases:
    1a. per the bug, I scaled the deployment down to 0, waited for the CR time to pass, then scaled it back up to 1 to verify that the reset occurred for all dependencies
    1b. I repeated the same test but not during the CR time to verify that no reset would occur
    1c. standard Condition Reset logic: cron fired and reset occurred for all dependencies (to verify I didn't break this)
    1d. standard logic: all 3 dependencies were sent (to verify I didn't break this)
  2. also, to verify I didn't break anything I tried creating a Sensor with just one dependency and triggered it

if mh.isCleanedUp() {
mh.setLastResetTime(0)
mh.resetTimeout = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be protected by a lock, e.g. we could utilize setLastResetTime().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, fixed this

@@ -451,7 +459,7 @@ func (mh *eventSourceMessageHolder) resetAll() {
for k := range mh.parameters {
mh.parameters[k] = false
}
mh.setLastResetTime(0)
mh.resetTimeout = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

juliev0 and others added 22 commits February 3, 2022 11:20
…s time it should have fired

Signed-off-by: Julie Vogelman <[email protected]>
…ventSourceMessageHolder.lastResetTime for performing this logic

Signed-off-by: Julie Vogelman <[email protected]>
Signed-off-by: Julie Vogelman <[email protected]>
… and causes issues in this scenario: 'dependency=(A && B && C); trigger A and B, scale deployment to 0 before condition reset time; scale back to 1 after'; result: A gets reset, then sets lastResetTime to 0 because it's unknown that B still exists and B never gets acked

Signed-off-by: Julie Vogelman <[email protected]>
Signed-off-by: Julie Vogelman <[email protected]>
…oj#1569)

Bumps [github.com/nsqio/go-nsq](https://github.com/nsqio/go-nsq) from 1.0.8 to 1.1.0.
- [Release notes](https://github.com/nsqio/go-nsq/releases)
- [Changelog](https://github.com/nsqio/go-nsq/blob/master/ChangeLog.md)
- [Commits](nsqio/go-nsq@v1.0.8...v1.1.0)

---
updated-dependencies:
- dependency-name: github.com/nsqio/go-nsq
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Julie Vogelman <[email protected]>
…goproj#1559)

Bumps [github.com/nats-io/stan.go](https://github.com/nats-io/stan.go) from 0.6.0 to 0.10.2.
- [Release notes](https://github.com/nats-io/stan.go/releases)
- [Commits](nats-io/stan.go@v0.6.0...v0.10.2)

---
updated-dependencies:
- dependency-name: github.com/nats-io/stan.go
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Julie Vogelman <[email protected]>
Bumps [go.uber.org/zap](https://github.com/uber-go/zap) from 1.19.0 to 1.20.0.
- [Release notes](https://github.com/uber-go/zap/releases)
- [Changelog](https://github.com/uber-go/zap/blob/master/CHANGELOG.md)
- [Commits](uber-go/zap@v1.19.0...v1.20.0)

---
updated-dependencies:
- dependency-name: go.uber.org/zap
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Julie Vogelman <[email protected]>
This commit updates golangci-lint and removes the interfacer check because it
was marked as deprecated and will be removed. See
golangci/golangci-lint#541 for more information.

Signed-off-by: William Van Hevelingen <[email protected]>
Signed-off-by: Julie Vogelman <[email protected]>
…rgoproj#1574)

Bumps [github.com/go-resty/resty/v2](https://github.com/go-resty/resty) from 2.3.0 to 2.7.0.
- [Release notes](https://github.com/go-resty/resty/releases)
- [Commits](go-resty/resty@v2.3.0...v2.7.0)

---
updated-dependencies:
- dependency-name: github.com/go-resty/resty/v2
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Julie Vogelman <[email protected]>
…goproj#1580)

Bumps [cloud.google.com/go/compute](https://github.com/googleapis/google-cloud-go) from 0.1.0 to 1.1.0.
- [Release notes](https://github.com/googleapis/google-cloud-go/releases)
- [Changelog](https://github.com/googleapis/google-cloud-go/blob/main/CHANGES.md)
- [Commits](googleapis/google-cloud-go@v0.1.0...dlp/v1.1.0)

---
updated-dependencies:
- dependency-name: cloud.google.com/go/compute
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Julie Vogelman <[email protected]>
…roj#1575)

Bumps [github.com/spf13/viper](https://github.com/spf13/viper) from 1.10.0 to 1.10.1.
- [Release notes](https://github.com/spf13/viper/releases)
- [Commits](spf13/viper@v1.10.0...v1.10.1)

---
updated-dependencies:
- dependency-name: github.com/spf13/viper
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Julie Vogelman <[email protected]>
…ed for the failsafe >60 seconds condition

Signed-off-by: Julie Vogelman <[email protected]>
Signed-off-by: Julie Vogelman <[email protected]>
Signed-off-by: Julie Vogelman <[email protected]>
Signed-off-by: Julie Vogelman <[email protected]>
juliev0 and others added 7 commits February 3, 2022 11:20
…oj#1569)

Bumps [github.com/nsqio/go-nsq](https://github.com/nsqio/go-nsq) from 1.0.8 to 1.1.0.
- [Release notes](https://github.com/nsqio/go-nsq/releases)
- [Changelog](https://github.com/nsqio/go-nsq/blob/master/ChangeLog.md)
- [Commits](nsqio/go-nsq@v1.0.8...v1.1.0)

---
updated-dependencies:
- dependency-name: github.com/nsqio/go-nsq
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Julie Vogelman <[email protected]>
…goproj#1559)

Bumps [github.com/nats-io/stan.go](https://github.com/nats-io/stan.go) from 0.6.0 to 0.10.2.
- [Release notes](https://github.com/nats-io/stan.go/releases)
- [Commits](nats-io/stan.go@v0.6.0...v0.10.2)

---
updated-dependencies:
- dependency-name: github.com/nats-io/stan.go
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Julie Vogelman <[email protected]>
…goproj#1584)

Bumps [github.com/Shopify/sarama](https://github.com/Shopify/sarama) from 1.26.1 to 1.31.1.
- [Release notes](https://github.com/Shopify/sarama/releases)
- [Changelog](https://github.com/Shopify/sarama/blob/main/CHANGELOG.md)
- [Commits](IBM/sarama@v1.26.1...v1.31.1)

---
updated-dependencies:
- dependency-name: github.com/Shopify/sarama
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Julie Vogelman <[email protected]>
…rgoproj#1586)

Bumps [github.com/xanzy/go-gitlab](https://github.com/xanzy/go-gitlab) from 0.50.2 to 0.54.4.
- [Release notes](https://github.com/xanzy/go-gitlab/releases)
- [Changelog](https://github.com/xanzy/go-gitlab/blob/master/releases_test.go)
- [Commits](xanzy/go-gitlab@v0.50.2...v0.54.4)

---
updated-dependencies:
- dependency-name: github.com/xanzy/go-gitlab
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Julie Vogelman <[email protected]>
Signed-off-by: Henrik Blixt <[email protected]>
Signed-off-by: Julie Vogelman <[email protected]>
…e set asyncronously through ConditionReset code

Signed-off-by: Julie Vogelman <[email protected]>
mh.lock.Lock() // since this can be called asyncronously as part of a ConditionReset, we neeed to lock this code
defer mh.lock.Unlock()
mh.lastResetTime = t
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we share the same lock and simply call setLastResetTime(0) instead of setRestTimeout(0)? It should be safe to set lastResetTime to 0, right?

mh.lock.Lock()
defer mh.lock.Unlock()
mh.lastResetTime = t
if t == 0 {
  mh.resetTime = 0
} else {
  mh.resetTime = t + 60
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, one of the changes I made as part of this whole thing is that I never set lastResetTime to 0. lastResetTime now always represents an actual time. This ended up being necessary in a certain scenario.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certain scenario such as?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say you have condition A && B && C,
Say A and B both occur
Say the pod goes down while the condition reset time occurs
So, we want to be able to ack both A and B when they arrive but we don't: this is because if A arrives, we call reset(), which ends up doing this:
if mh.isCleanedUp() { mh.setLastResetTime(0) }
That is because the pod went down and B is no longer stored in the eventSourceMessageHolder's memory.
Then when B arrives we fail to ack it, because we would skip the logic which checks lastResetTime.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good Point!

Copy link
Member

@whynowy whynowy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great work!

@whynowy whynowy merged commit 9606aed into argoproj:master Feb 3, 2022
@juliev0 juliev0 deleted the issue_1543 branch February 3, 2022 22:48
aweis89 pushed a commit to aweis89/argo-events that referenced this pull request Feb 15, 2022
…ggering time (argoproj#1585)

* incorporating new struct TimeBasedReset to encapsulate reset state/spec

Signed-off-by: Julie Vogelman <[email protected]>
Signed-off-by: Aaron Weisberg <[email protected]>
BulkBeing pushed a commit to BulkBeing/argo-events that referenced this pull request Mar 7, 2022
…ggering time (argoproj#1585)

* incorporating new struct TimeBasedReset to encapsulate reset state/spec

Signed-off-by: Julie Vogelman <[email protected]>
juliev0 added a commit to juliev0/argo-events that referenced this pull request Mar 29, 2022
…ggering time (argoproj#1585)

* incorporating new struct TimeBasedReset to encapsulate reset state/spec

Signed-off-by: Julie Vogelman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants