Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimistically return metrics from the cache (if present) #418

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

nbrachet
Copy link

before building a metric to add to the cache.

@nbrachet
Copy link
Author

nbrachet commented Dec 4, 2020

@jupp0r any advice on how to address this jenkins error:

no coveralls.io token specified and no travis job id found

It doesn't seem related to this PR?

@jupp0r
Copy link
Owner

jupp0r commented Dec 4, 2020

@jupp0r any advice on how to address this jenkins error

Ignore it, it's a know issue that I though I had fixed.

@gjasny
Copy link
Collaborator

gjasny commented Dec 4, 2020

Hi,

Could you please add a test for the expected behavior?

Thanks,
Gregor

@nbrachet
Copy link
Author

nbrachet commented Dec 4, 2020

I've been thinking about adding a test but I'm struggling to come with anything.
This is a transparent change and I can't come up with a way to externally check whether a new unique_ptr was created or not.

PS: I'm not super happy about the code duplication either but again I couldn't come up with a better approach.

@danevandyck
Copy link

Is the idea that hashing the labels is less expensive than the call to make_unique? Seems doubtful but not sure. Regardless, seems like labels should only be hashed once if the metric is missing.

@nbrachet
Copy link
Author

nbrachet commented Dec 8, 2020

The idea is that the vast majority of the time Add() is called it will return the metric from the cache. Such that the first call with a new set of labels will be indeed more expensive (2 hashes + 1 make_unique), but it will be amortized over time since subsequent calls will be cheaper (1 hash with no make_unique and no ~unique_ptr).

As an illustration, we have the following pseudo-code:

 Family<Counter> counter{"total_requests", "Count all requests"};
 
 void handle_request(...) {
    auto vhost = <get virtual host>;
    counter.Add({ "vhost", vhost }).Increment();
    ...
 }

The first request for each vhost initializes the cache (ie. Family<Counter>::metrics_, Family<Counter>::labels_, and Family<Counter>::labels_reverse_lookup_) after that all subsequent requests simply retrieve the Counter from there, saving a call to make_unique<Counter> and ~unique_ptr<Counter>.

@danevandyck
Copy link

I get it. How about implementing a helper which accepts a pre-computed hash value, thereby avoiding the new first-lookup penalty and avoiding the duplication mentioned earlier?

@nbrachet
Copy link
Author

nbrachet commented Dec 9, 2020

How about this?

Comment on lines +112 to +114
metrics_iterator iter = FindMetric(labels);
if (iter->second) return *(iter->second);
return Add(iter, detail::make_unique<T>(args...));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're acquiring the lock twice here. The helpers (e.g. FindMetric and Add overload) are private and should assume that the lock is held.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as before: One double-locking is nothing in the long-run. However, grabbing the lock before computing the hash is always detrimental.
But that really opens the door to the fix I prefer: move the implementation of Add to the header file. Much simpler but I figure less acceptable.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think consolidating lock acquisition is incompatible with hashing outside. As long as the private methods remain in the code file the goal of preventing instantiation outside of Counter/Gauge/... should be achieved. So,

  T& Add(const std::map<std::string, std::string>& labels, Args&&... args) {
    auto hash = detail::hash_labels(labels);
    std::lock_guard<std::mutex> lock{mutex_};
    metrics_iterator iter = FindMetric(hash);
    if (iter->second) return *(iter->second);
    return Add(iter, detail::make_unique<T>(args...));

where both private methods (FindMetric and Add) use a pre-computed hash.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking the lock twice leads to a race condition where the unique ptr is null in between.

My goal when hiding the implementation was to avoid circular includes as well as limiting the exposed symbols to have at least a chance to provide patches while maintaining the SONAME.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking the lock twice leads to a race condition where the unique ptr is null in between.

If by "race condition" you mean "there is a possibility that 2 (or more) unique ptr's are created" that is correct.

My goal when hiding the implementation was to avoid circular includes as well as limiting the exposed symbols to have at least a chance to provide patches while maintaining the SONAME.

Right. This is why this patch is structured this way: to maintain the same isolation of the implementation. The runtime cost is what we've talked about here: double-hashing and double-locking until the cache is populated. After that, no more double-hashing or double-locking, and no more extraneous new/delete (the original intent of this patch.)
I think it's a worthy trade-off.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my forgeAI-nick/optimistic-family-add branch I moved the lock into to inlined template. That way we don't double lock.

@gjasny
Copy link
Collaborator

gjasny commented Dec 26, 2020

Could you please reset your branch to "our" forgeAI-nick/optimistic-family-add?

@gjasny gjasny requested a review from jupp0r December 26, 2020 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants