-
Notifications
You must be signed in to change notification settings - Fork 73
Improve memory ownership at pprof parsing #613
Improve memory ownership at pprof parsing #613
Conversation
if err != nil { | ||
return nil, err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we need to break the loop on the first error encountered. If we do need, then I guess we should bump the DiscardedProfiles
metrics by the number of profiles we actually dropped, like len(series.Samples)-i
. However, it does not work for DiscardedBytes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't you are correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I fully agree, we should ingest all the profiles that are valid (== soft error), and then returning the error appropriate for the offending sample. Otherwise we would loose perfectly valid samples.
This is how mimir is doing it: https://github.com/grafana/mimir/blob/main/pkg/ingester/ingester.go#L826-L828
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. If you don't mind, I'll implement this in a separate PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree the scope of that is beyond this PR.
cca1083
to
e41115b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
if err != nil { | ||
return nil, err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I fully agree, we should ingest all the profiles that are valid (== soft error), and then returning the error appropriate for the offending sample. Otherwise we would loose perfectly valid samples.
This is how mimir is doing it: https://github.com/grafana/mimir/blob/main/pkg/ingester/ingester.go#L826-L828
pkg/ingester/ingester.go
Outdated
case validation.OutOfOrder: | ||
return nil, connect.NewError(connect.CodeInvalidArgument, err) | ||
return connect.NewError(connect.CodeInvalidArgument, err) | ||
case validation.SeriesLimit: | ||
return nil, connect.NewError(connect.CodeResourceExhausted, err) | ||
return connect.NewError(connect.CodeResourceExhausted, err) | ||
default: | ||
validation.DiscardedProfiles.WithLabelValues(string(reason), instance.tenantID).Add(float64(1)) | ||
validation.DiscardedBytes.WithLabelValues(string(reason), instance.tenantID).Add(float64(size)) | ||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will mean we will no longer see OutOfOrder and SeriesLimits in the metrics. I am not too sure if this is what we want.
I think after an OutOfOrder, there might be a sample that is perfectly in order again. In the same time the SeriesLimit is only likely affect other iterations of the series.Sample
loop, but if we have another series it might be perfectly fine, as the series is already existing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right – I reverted the logic. BTW, shouldn't we account for any discarded profiles, including "unknown" reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not too sure what currently would be an unknown reason, but I think that makes sense and we would know we need to add a new reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…-pprof-parsing-pool-usage Improve memory ownership at pprof parsing
In some cases an object allocated from the pool is not returned back to the pool. The change is aimed at improving ownership of the objects allocated in pools at pprof parsing specifically, which affects both distributor and ingester services