-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1733757 - Store created label metrics to avoid rapidly reconstructing them all … #1823
Conversation
0079897
to
03b6378
Compare
Question regarding this:
Isn't that impossible given that we limit the number of unknown labels and then move to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r+wc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing blocking, just a question about using bit shifting mostly. LGTM!
glean-core/android/src/test/java/mozilla/telemetry/glean/private/LabeledMetricTypeTest.kt
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with the change requests from Mike and Travis addressed. Great job.
…the time This fixes a crashing bug, that probably started being an issue since March 2020 / Glean v26, when we removed finalizers from the Kotlin library[1]. Depending on how GC runs it was theoretically possible to hit that even with finalizers in the SDK. Contrary to the commit message in [1] we do have one more place where we dynamically construct metrics: Labeled Metrics! It used to be the case that whenever you access the submetric of a labeled metric, that is `Metrics.Category.Name["label"]`, we created a new Rust object, put that into the internal ConcurrentHandleMap and returned a handle to it. A ConcurrentHandleMap does a lot of work to ensure that you can't misuse handles against the wrong map, when the item was already replaced, etc. It also sets an upper limit of entries it can handle[2]. That limit is high enough for the static use of Rust (you would have to use 32767+ metrics of a single type), but quickly[3] breaks down The solution is to not re-create labeled submetrics on every access. We cache those ourselves now, identified by the parent's metric handle and the label. Now accessing the same label 32768+ times is fine, you get the same Rust object every time (as long as it exists). There's one last piece: We do expose a way to destroy metrics, even if that shouldn't happen for any statically defined ones. This is not happening on Kotlin, where we removed finalizers some time ago [1]. We're now removing those finalizers on Swift and Python as well. We could try to get smart and do some double-checks that the handle is not stale when the submetric is accessed. But that breaks down quickly, because the metric could be destroyed between the access and the recording, still leading to issues. The only way to handle that would be some reference counting on the object itself, but the current implementation doens't easily allow that. So for now we just don't destroy boolean/counter/string metrics, ever. The included tests fail without the corresponding library changes, giving us a bit more confidence that the fix works. Now there's still one way to trigger a crash, in theory: Keep accessing previously-unused labels 32768+ times. If you do that you're really on your own. [1]: 25bf9ea [2]: https://github.com/mozilla/ffi-support/blob/0fdc22a8dfe3731be5fd39b311e4e4885219e26c/src/handle_map.rs#L160-L166 [3]: Well, "quickly". Reproducing that on get.webgl.org takes 8+ minutes.
…the time
This fixes a crashing bug, that probably started being an issue since
March 2020 / Glean v26, when we removed finalizers from the Kotlin
library1. Depending on how GC runs it was theoretically possible to
hit that even with finalizers in the SDK.
Contrary to the commit message in 1 we do have one more place where we
dynamically construct metrics: Labeled Metrics!
It used to be the case that whenever you access the submetric of a
labeled metric, that is
Metrics.Category.Name["label"]
, we created anew Rust object, put that into the internal ConcurrentHandleMap and
returned a handle to it.
A ConcurrentHandleMap does a lot of work to ensure that you can't misuse
handles against the wrong map, when the item was already replaced, etc.
It also sets an upper limit of entries it can handle2.
That limit is high enough for the static use of Rust (you would have to
use 32767+ metrics of a single type), but quickly[3] breaks down
The solution is to not re-create labeled submetrics on every access.
We cache those ourselves now, identified by the parent's metric handle
and the label.
Now accessing the same label 32768+ times is fine, you get the same
Rust object every time (as long as it exists).
There's one last piece: We do expose a way to destroy metrics, even if
that shouldn't happen for any statically defined ones.
This is not happening on Kotlin, where we removed finalizers some time
ago 1.
We're now removing those finalizers on Swift and Python as well.
We could try to get smart and do some double-checks that the handle is
not stale when the submetric is accessed.
But that breaks down quickly, because the metric could be destroyed
between the access and the recording, still leading to issues.
The only way to handle that would be some reference counting on the
object itself, but the current implementation doens't easily allow that.
So for now we just don't destroy boolean/counter/string metrics, ever.
The included tests fail without the corresponding library changes,
giving us a bit more confidence that the fix works.
Now there's still one way to trigger a crash, in theory:
Keep accessing previously-unused labels 32768+ times.
If you do that you're really on your own.
[3]: Well, "quickly". Reproducing that on get.webgl.org takes 8+ minutes.