Skip to content
This repository has been archived by the owner on Jul 19, 2023. It is now read-only.

Improve memory ownership at pprof parsing #613

Merged

Conversation

kolesnikovae
Copy link
Contributor

@kolesnikovae kolesnikovae commented Apr 4, 2023

In some cases an object allocated from the pool is not returned back to the pool. The change is aimed at improving ownership of the objects allocated in pools at pprof parsing specifically, which affects both distributor and ingester services

@CLAassistant
Copy link

CLAassistant commented Apr 4, 2023

CLA assistant check
All committers have signed the CLA.

Comment on lines +211 to 213
if err != nil {
return nil, err
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we need to break the loop on the first error encountered. If we do need, then I guess we should bump the DiscardedProfiles metrics by the number of profiles we actually dropped, like len(series.Samples)-i. However, it does not work for DiscardedBytes

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't you are correct.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I fully agree, we should ingest all the profiles that are valid (== soft error), and then returning the error appropriate for the offending sample. Otherwise we would loose perfectly valid samples.

This is how mimir is doing it: https://github.com/grafana/mimir/blob/main/pkg/ingester/ingester.go#L826-L828

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. If you don't mind, I'll implement this in a separate PR

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the scope of that is beyond this PR.

@kolesnikovae kolesnikovae force-pushed the chore/improve-pprof-parsing-pool-usage branch from cca1083 to e41115b Compare April 4, 2023 17:25
@kolesnikovae kolesnikovae changed the title chore: improve memory ownership at pprof parsing Improve memory ownership at pprof parsing Apr 4, 2023
@kolesnikovae kolesnikovae marked this pull request as ready for review April 7, 2023 09:10
@kolesnikovae kolesnikovae requested a review from a team April 7, 2023 14:56
@cyriltovena cyriltovena self-assigned this Apr 9, 2023
Copy link
Collaborator

@cyriltovena cyriltovena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +211 to 213
if err != nil {
return nil, err
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I fully agree, we should ingest all the profiles that are valid (== soft error), and then returning the error appropriate for the offending sample. Otherwise we would loose perfectly valid samples.

This is how mimir is doing it: https://github.com/grafana/mimir/blob/main/pkg/ingester/ingester.go#L826-L828

Comment on lines 199 to 206
case validation.OutOfOrder:
return nil, connect.NewError(connect.CodeInvalidArgument, err)
return connect.NewError(connect.CodeInvalidArgument, err)
case validation.SeriesLimit:
return nil, connect.NewError(connect.CodeResourceExhausted, err)
return connect.NewError(connect.CodeResourceExhausted, err)
default:
validation.DiscardedProfiles.WithLabelValues(string(reason), instance.tenantID).Add(float64(1))
validation.DiscardedBytes.WithLabelValues(string(reason), instance.tenantID).Add(float64(size))
return err
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will mean we will no longer see OutOfOrder and SeriesLimits in the metrics. I am not too sure if this is what we want.

I think after an OutOfOrder, there might be a sample that is perfectly in order again. In the same time the SeriesLimit is only likely affect other iterations of the series.Sample loop, but if we have another series it might be perfectly fine, as the series is already existing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right – I reverted the logic. BTW, shouldn't we account for any discarded profiles, including "unknown" reason?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not too sure what currently would be an unknown reason, but I think that makes sense and we would know we need to add a new reason.

Copy link
Collaborator

@simonswine simonswine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kolesnikovae kolesnikovae merged commit 224e15b into grafana:main Apr 14, 2023
simonswine pushed a commit to simonswine/pyroscope that referenced this pull request Jun 30, 2023
…-pprof-parsing-pool-usage

Improve memory ownership at pprof parsing
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants