-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stable metrics sdk #412
Stable metrics sdk #412
Conversation
Codecov Report
@@ Coverage Diff @@
## main #412 +/- ##
==========================================
- Coverage 73.80% 73.76% -0.05%
==========================================
Files 53 53
Lines 1680 1681 +1
==========================================
Hits 1240 1240
- Misses 440 441 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
3c9c024
to
b7328db
Compare
Eh, marking as ready for review. I don't think anything major needs to be changed before merging since it is all in experimental. So I'd like to get reviews and merge this so I can be doing smaller future PRs that are easier to review. |
b16baec
to
c131250
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Started with a review of the API, I'll try to do the internals a bit later this week.
kind :: otel_instrument:kind(), | ||
value_type :: otel_instrument:value_type(), | ||
unit :: otel_instrument:unit() | undefined, | ||
callback :: otel_instrument:callback() | undefined}). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this record meant to be publicly available outside the app? It is given its own type in otel_instrument
so I'm assuming the record ought to be private (eg src/otel_metrics_rec.hrl
which could include this file to keep the macro definitions public)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue is it is needed in the SDK app. So I'm not sure what is best to do since I'd rather not go the route of putting it all by a module for accessing each field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh well we could cheat a bit I guess. Have the content is wrapped in a sort of:
-ifdef(OTEL_INTERNAL_MACRO).
-record(instrument, ...)
-endif.
and then you could include the same file but running a -define(OTEL_INTERNAL_MACRO, true)
prior to inclusion gives you visibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if that's overkill though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, might be overkill. But might be worth it if we don't think of anything else.
For now though since this is in experimental I think we should move forward as is and can freely make the record private after this as long as its before it goes into the stable SDK.
min=infinity, %% works because any atom is > any integer | ||
max=neg_infinity, %% requires an explicit check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just so I'm understanding right, those min/max values are the default until any data is encountered, right? So it's normal for min
to be bigger than max
? I think semantically undefined
would make most sense because it wouldn't be a legitimate in-band value (though given Erlang doesn't support infinite values in floats this is sort of a moot point)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, these are the defaults until any data is encountered.
Going with undefined
probably makes the most sense. I had hopes that there would be something I'd find that could be used for max
that would always be less than any number, but since there isn't... wait, just realized that even though Erlang has no number size limit the Otel protocol does, so technically we could use -2^64
here. hmm
The reason I wanted a term that I could compare with <
is so it could be done in a matchspec -- and it means only one comparison needed, not one to check if its undefined
and then to check if its <
.
apps/opentelemetry_experimental/src/otel_aggregation_histogram_explicit.erl
Show resolved
Hide resolved
%% unless this is changed this needs to be switched to a link | ||
%% also need to be able to recover who the readers are when the meter server restarts | ||
%% discovery by looking at a supervision tree could be a good idea as it would allow for added | ||
%% readers to be returned to the server after a crash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
supervision tree would work though it can fail to restore fine if the server's failure lines up with a reader's failure; another idea is to have a parent-owned table recording which readers should exist, so in case of a restart the new server can restore state to where it should be?
If does rely on having stable predictable names though, and being able to map these to either processes, or supervisor entries.
apps/opentelemetry_experimental/src/otel_metric_exporter_pid.erl
Outdated
Show resolved
Hide resolved
12231e7
to
1baebe3
Compare
bucket_counts=zero_buckets(length(Boundaries)), | ||
record_min_max=RecordMinMax, | ||
min=infinity, %% works because any atom is > any integer | ||
max=?MIN_DOUBLE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that the data model marks both the min and max value as optional does seem to mean we have to be careful not to use the MIN value unless you want it to be impossible to distinguish between a legit MIN and a placeholder (https://github.com/open-telemetry/opentelemetry-proto/blob/c5c8b28012583fda55b0cb16f73a820722171d49/opentelemetry/proto/metrics/v1/metrics.proto#L496-L501)
Is this logic referred to the collector or the exporter at this point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to distinguish because this record is only created when a measurement is taken. If minmax are to be recorded it will be set at that time.
Which reminds me, explicit_histogram_checkpoint
should only be number()
.
The only time this may be an issue is a delta temporality, which will reset to the default when taking the checkpoint. That isn't supported yet.
1baebe3
to
da6a480
Compare
Co-authored-by: Fred Hebert <[email protected]>
379b27d
to
0ba607d
Compare
0153159
to
4a29466
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved on my end, though I'd like to see what @GregMefford has to say about it
0d8865e
to
6b8885b
Compare
Going to go ahead and merge. But it would be great if @GregMefford gets some time. If @GregMefford and/or @bryannaegele are able to make the SIG meeting tomorrow we could also do a code walk through at that time. Future PRs will be smaller. |
Opening as a draft because I'm not 100% on the direction of this but I think it is close enough to start getting feedback.
There will be a number of tasks after this PR is merged, like supporting floats in the aggregations, filtering attributes in views and adding the async/observable metrics, in addition to many clean up tasks.
Important things to review in this PR are design decisions. One being that each measurement is a message. I do not like, and don't think we can keep, the message going to the
otel_meter_server
because that also handles other messages and should remain responsive for those. I will likely create a process, with the possibility in the future to be a pool of processes assuming aggregations remain safe to act concurrently.The Reader process isn't properly handled yet either because it is monitored by the otel_meter_server and not linked but the ets tables it uses are owned by the server. Either the reader needs to be able to run on its own or it should be linked. I'd prefer the former.
Also getting an Instruments no longer requires a call to the server process. Initially I thought it best if it did so that it could be matched to views at that point and not on each measurement but this is likely not useful, and certainly not useful compared to not having to make a call to get an Instrument. Note that future view matching could include attributes in the criteria, so then setting up matches based only on the Instrument isn't even an option.
An issue is that views can be added at any time and after an instrument is matched once (also an issue if we were to support criteria that matches on attributes) it won't be matched again, meaning views added after the first time an instrument is used will never be matched.
Note that filtering attributes is different from using them as criteria. Filtering still needs to be added but needs to be there before this moves out of experimental.
Not sure if the matchspec in
checkpoint
is efficient but I think checkpoint should be working against all metrics at once instead of one at a time, so it likely will be removed anyway.