-
Notifications
You must be signed in to change notification settings - Fork 528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ingester lock tweaks to minimize headblock locking during search #3328
Conversation
Putting back to draft because allocating the channel size based on the limit doesn't work for cases where the limit is large (1,000,000+) |
@@ -699,6 +699,7 @@ func (b *walBlock) FetchTagValues(ctx context.Context, req traceql.AutocompleteR | |||
if err != nil { | |||
return fmt.Errorf("error opening file %s: %w", page.path, err) | |||
} | |||
defer file.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the defer inside the loop enough, or do we want to close it out each iteration through blockFlushes
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct, it's inside the loop and all of them get closed at the end, which is not optimal. This func and several others are all following the same pattern, and I was aiming to have minimum changes necessary. But happy to make larger changes and defer as expected if we want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm 👍
What this PR does:
We have seen occasional high read latencies on ingesters where tag lookups and searches take 20+ seconds instead of the typical few. Investigation showed that it is contention on the block mutexes, and specifically due to the design of
instance.Search
andSearchResults
. Instead of searching and releasing the headblock asap, the lock was held until after also acquiring a lock on the rest of the blocks and beginning to read fromSearchResults
. High latency occurs when read/write access to the mutexes stack up, such that it might have to wait for a current search to finish (releasing read lock), then a block flush (write lock), then acquiring read lock again.This moves away from
SearchResults
and the channel-based flow to a simpler setup like the other endpoints. Since this was the only usage ofSearchResults
this seems better than trying to patch it (again).Which issue(s) this PR fixes:
Fixes #
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]