Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use sync.Map to prevent race on map access in osquery-perf #24501

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

dantecatalfamo
Copy link
Member

@dantecatalfamo dantecatalfamo commented Dec 6, 2024

#24381

Do we add changes/ for osquery-perf?

  • Manual QA for all new/changed functionality

Copy link

codecov bot commented Dec 6, 2024

Codecov Report

Attention: Patch coverage is 0% with 48 lines in your changes missing coverage. Please review.

Project coverage is 63.49%. Comparing base (d8dafd6) to head (38dd292).
Report is 124 commits behind head on main.

Files with missing lines Patch % Lines
cmd/osquery-perf/agent.go 0.00% 48 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #24501      +/-   ##
==========================================
+ Coverage   63.43%   63.49%   +0.05%     
==========================================
  Files        1586     1597      +11     
  Lines      150594   151450     +856     
  Branches     3867     3867              
==========================================
+ Hits        95534    96156     +622     
- Misses      47437    47635     +198     
- Partials     7623     7659      +36     
Flag Coverage Δ
backend 64.36% <0.00%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dantecatalfamo dantecatalfamo marked this pull request as ready for review December 9, 2024 17:10
@dantecatalfamo dantecatalfamo requested a review from a team as a code owner December 9, 2024 17:10
@@ -2401,16 +2411,27 @@ func (a *agent) submitLogs(results []resultLog) error {
return fmt.Errorf("/version check failed: %w", err)
}

var resultLogs []byte
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an attempt to reduce memory/CPU footprint on a previous load test where we had to simulate Fleet server being down and a lot of logs buffered in osquery perf. Is this causing any issues?

cmd/osquery-perf/agent.go Outdated Show resolved Hide resolved
Copy link
Member

@lucasmrod lucasmrod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where was the race exactly?

And no need for changes/ file for osquery-perf changes.

@dantecatalfamo
Copy link
Member Author

dantecatalfamo commented Dec 9, 2024

@lucasmrod There's a screenshot in the linked issue. Essentially there are two goroutines on separate timers that read from the same map concurrently, and if things line up just right it can cause a panic

@dantecatalfamo
Copy link
Member Author

I've actually never run into it myself, but victor was running a load test and hit it. I suspect it's super rare

@lukeheath
Copy link
Member

Do we add changes/ for osquery-perf?

@dantecatalfamo No need to include a change file unless it's a product change.

@@ -777,28 +776,32 @@ func (a *agent) runLoop(i int, onlyAlreadyEnrolled bool) {
// check if we have any scheduled queries that should be returning results
var results []resultLog
now := time.Now().Unix()
a.scheduledQueriesMu.Lock()
// a.scheduledQueriesMu.Lock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this comment.

}
scheduledQueryData[scheduledQueryName] = q
a.scheduledQueryData.Store(scheduledQueryName, q)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logger goroutine could send an incomplete number of results (if it runs after a.scheduledQueryData.Clear), right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've replaced the sync.Map with a pointer

Copy link
Member

@lucasmrod lucasmrod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple of comments/questions.

v.lastRun = now
a.scheduledQueryData[queryName] = v
query.lastRun = now
a.scheduledQueryData.Store(queryName, query)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still might end up with a map with inconsitent values, right? (if this gets executed at same time as the assignment to the map on the other goroutine).

Could we have a mutex to do mutual exclusion between the "logger" and "config" goroutines? (to simplify and reduce risk of races/invalid-data)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took the reference to the scheduledQueryData from the agent before iterating over it. If you still think I should add a muted after I can

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants