Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingester lock tweaks to minimize headblock locking during search #3328

Merged
merged 9 commits into from
Jan 31, 2024

Conversation

mdisibio
Copy link
Contributor

@mdisibio mdisibio commented Jan 24, 2024

What this PR does:
We have seen occasional high read latencies on ingesters where tag lookups and searches take 20+ seconds instead of the typical few. Investigation showed that it is contention on the block mutexes, and specifically due to the design of instance.Search and SearchResults. Instead of searching and releasing the headblock asap, the lock was held until after also acquiring a lock on the rest of the blocks and beginning to read from SearchResults. High latency occurs when read/write access to the mutexes stack up, such that it might have to wait for a current search to finish (releasing read lock), then a block flush (write lock), then acquiring read lock again.

This moves away from SearchResults and the channel-based flow to a simpler setup like the other endpoints. Since this was the only usage of SearchResults this seems better than trying to patch it (again).

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@mdisibio mdisibio marked this pull request as ready for review January 26, 2024 15:07
@mdisibio mdisibio marked this pull request as draft January 26, 2024 17:45
@mdisibio
Copy link
Contributor Author

Putting back to draft because allocating the channel size based on the limit doesn't work for cases where the limit is large (1,000,000+)

@@ -699,6 +699,7 @@ func (b *walBlock) FetchTagValues(ctx context.Context, req traceql.AutocompleteR
if err != nil {
return fmt.Errorf("error opening file %s: %w", page.path, err)
}
defer file.Close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the defer inside the loop enough, or do we want to close it out each iteration through blockFlushes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct, it's inside the loop and all of them get closed at the end, which is not optimal. This func and several others are all following the same pattern, and I was aiming to have minimum changes necessary. But happy to make larger changes and defer as expected if we want.

modules/ingester/instance_search.go Show resolved Hide resolved
modules/ingester/instance_search.go Outdated Show resolved Hide resolved
modules/ingester/instance_search.go Show resolved Hide resolved
Copy link
Member

@joe-elliott joe-elliott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm 👍

@mdisibio mdisibio marked this pull request as ready for review January 30, 2024 12:43
@mdisibio mdisibio changed the title Ingester lock tweaks Ingester lock tweaks to minimize headblock locking during search Jan 30, 2024
@mdisibio mdisibio merged commit 642c0a1 into grafana:main Jan 31, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants