Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

state: remove TimeTable and rely on objects' modify times instead #24112

Merged
merged 40 commits into from
Nov 1, 2024

Conversation

pkazmierczak
Copy link
Contributor

@pkazmierczak pkazmierczak commented Oct 2, 2024

Core scheduler relies on a special table in the state store—the TimeTable—to figure out which objects can be GC'd. The TimeTable correlates Raft indices with objects insertion time, a solution we used before most of the objects we store in the state contained timestamps. This introduced a bit of a memory overhead and complexity, but most importantly meant that any GC threshold users set greater than timeTableLimit = 72 * time.Hour was ignored. This PR removes the TimeTable and relies on object timestamps to determine whether they could be GCd or not.

Fixes #16359
Resolves #17233

internal ref: https://hashicorp.atlassian.net/browse/NET-10269

Copy link
Member

@schmichael schmichael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will terminal objects that didn't have a timestamp before be GC'd on their next gc interval? That seems fine.

I think we should make sure to really call this out in Version Specific Upgrade notes. While the vast majority of users shouldn't notice anything, I think there are 2 classes of users who will:

  1. Users who either intentionally or unintentionally used gc thresholds > 72h are in for a surprise.
  2. Users, especially those with custom high gc thresholds, with lots of terminal objects that suddenly get GC'd "early" post-upgrade (as long as my understanding of the upgrade path is accurate).

This is a pretty astounding PR from an archeological standpoint: lots of 2015 Armon code getting axed! The fact that it Just Worked for most use cases for almost a decade is quite an achievement! 🎉

nomad/structs/structs.go Show resolved Hide resolved
nomad/core_sched_test.go Show resolved Hide resolved
Comment on lines +37 to +38
eval.CreateTime = time.Now().Add(-6 * time.Hour).UnixNano() // make sure objects we insert are older than GC thresholds
eval.ModifyTime = time.Now().Add(-5 * time.Hour).UnixNano()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is fine in this test, but in general I prefer using hardcoded times to avoid flakes. For example if you run a test that relies on Now().Add(-6) being 1 hour before Add(-5) at just the wrong time it will fail because -6 and -5 is the same time: https://go.dev/play/p/JpWEEWUO7i4

Using Now().UTC() would also fix this since UTC is blessedly free from daylight saving, but I think there's other fun time hijinks that could occur (leap seconds?).

Again: this test is fine, but in the future lets hardcode a time.Date(...).

Comment on lines +2100 to +2101
ModifyTime: now.UnixNano(),
CutoffTime: now.Add(-1 * time.Hour),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh hey! I think this is susceptible to daylight savings time bugs! Don't change it to half an hour either as that could still break if @philrenaud runs the tests at just the wrong time while visiting Newfoundland: https://en.wikipedia.org/wiki/Newfoundland_Time_Zone (I may enjoy timezone hijinks too much.)

nomad/fsm.go Show resolved Hide resolved
nomad/structs/structs.go Outdated Show resolved Hide resolved
Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once comments are resolved.

We should also include both a changelog entry and an upgrade guide note for 1.9.2. One thing in particular that stands out is that because some objects like Deployments are created in the scheduler (which could be on a follower), this requires that servers have at least roughly-sync'd clocks for correct GC. We probably already do implicitly require this somewhere in the code base but it'd be good to have a note about it just in case someone is doing something weird and has been getting away with it for a while.

api/csi.go Outdated Show resolved Hide resolved
nomad/structs/csi.go Outdated Show resolved Hide resolved
@schmichael
Copy link
Member

One thing in particular that stands out is that because some objects like Deployments are created in the scheduler (which could be on a follower), this requires that servers have at least roughly-sync'd clocks for correct GC.

Timetable also had the problem of using followers' clocks. I have typed and deleted a lot of conjectures around local skew vs remote skew and what the optimal way to handle time in a distributed system is....... but I don't think any of it is worth worrying about. Our time-based GC can't escape clock skew problems of some kind. Even if we always used the leader's time for insertion and sent the leader's time with gc evals... leadership can change at any time! Clock skew is inescapable, and hopefully our users understand any time-based parameters are only as accurate as their clocks.

@pkazmierczak pkazmierczak merged commit f7847c6 into main Nov 1, 2024
28 checks passed
@pkazmierczak pkazmierczak deleted the f-gc-limits-3-days branch November 1, 2024 18:38
pkazmierczak added a commit that referenced this pull request Nov 7, 2024
Whenever setting objects creation/modify time, we should always use UTC. #24112
introduced some inconsistencies in this area, and this PR fixes it.
tgross added a commit that referenced this pull request Nov 9, 2024
When we removed the time table in #24112 we introduced a bug where if a previous
version of Nomad had written a time table entry, we'd return from the restore
loop early and never load the rest of the FSM. This will result in a mostly or
partially wiped state for that Nomad node, which would then be out of sync with
its peers (which would also have the same problem on upgrade).

The bug only occurs when the FSM is being restored from snapshot, which isn't
the case if you test with a server that's only written Raft logs and not
snapshotted them.

While fixing this bug, we still need to ensure we're reading the time table
entries even if we're throwing them away, so that we move the snapshot reader
along to the next full entry.

Fixes: #24411
tgross added a commit that referenced this pull request Nov 11, 2024
When we removed the time table in #24112 we introduced a bug where if a previous
version of Nomad had written a time table entry, we'd return from the restore
loop early and never load the rest of the FSM. This will result in a mostly or
partially wiped state for that Nomad node, which would then be out of sync with
its peers (which would also have the same problem on upgrade).

The bug only occurs when the FSM is being restored from snapshot, which isn't
the case if you test with a server that's only written Raft logs and not
snapshotted them.

While fixing this bug, we still need to ensure we're reading the time table
entries even if we're throwing them away, so that we move the snapshot reader
along to the next full entry.

Fixes: #24411
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.9.x backport to 1.9.x release line
Projects
None yet
3 participants