-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Family::get_or_create #102
Conversation
d9ec6e0
to
d777de3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh silly me.
Great catch. Thanks for the patch.
Can you include a changelog entry?
Out of curiosity, did you hit this race condition, or found it while reading?
We discovered it the hard way with gauges reporting increments of 18.4 quintillions after making the switch in 3 repos. |
I'll add a changelog entry. Can you release 0.18.1 with that fix? |
d777de3
to
fe41d10
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, did you hit this race condition, or found it while reading?
We discovered it the hard way with gauges reporting increments of 18.4 quintillions after making the switch in 3 repos.
Sorry to hear that. My bad. Thanks for debugging.
Do you have some time to tackle the second race condition as well? If not, I will give it a shot tomorrow.
I'll add a changelog entry. Can you release 0.18.1 with that fix?
Yes, I will do that as soon as we have both race conditions fixed. See below.
src/metrics/family.rs
Outdated
self.metrics | ||
.write() | ||
.entry(label_set.clone()) | ||
.or_insert_with(|| self.constructor.new_metric()); | ||
|
||
RwLockReadGuard::map(self.metrics.read(), |metrics| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With Family::remove
there is actually a second race condition.
In the scenarios where we have two threads:
- A: Call
Family::get_or_create
with a new label set.
1. Acquiring the read lock. Nothing there.
2. Acquiring the write lock. Inserting the new metric.
3. Being preempted by the OS. - B: Call
Family::remove
with the above label set. - A: Acquiring the read lock for the second (last) time,
.expect()
ing the previously inserted metric to be there.
This can be prevented by leveraging RwLockWriteGuard::downgrade
of the previously acquired write lock instead of doing acquiring another read lock via self.metrics.read
.
Does the above make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense yes. The patch now uses RwLockWriteGuard::downgrade
.
fe41d10
to
e08410a
Compare
The method should not overwrite an existing key after it obtains the write lock. Consider the following scenario with `Family<LabelSet, Gauge>` used by threads A and B: 1. A and B both call `family.get_or_insert(&label_set)` 2. A and B both acquire read lock and finds no key 3. A gets write lock and inserts a new gauge with value 0 4. A increments returned gauge to 1 5. B gets write lock and inserts a new gauge with value 0 6. B increments returned gauge to 1 7. A decrements gauge back to 0 8. B decrements gauge which now overflows to `u64::MAX` Signed-off-by: Anthony Ramine <[email protected]>
e08410a
to
290369f
Compare
The method should not overwrite an existing key after it obtains the write lock.
Consider the following scenario with
Family<LabelSet, Gauge>
used by threads A and B:family.get_or_insert(&label_set)
u64::MAX