-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(k8sprocessor): race condition when getting Pod data #938
Conversation
a6c6817
to
a742efd
Compare
func (c *WatchClient) GetPodAttributes(identifier PodIdentifier) (map[string]string, bool) { | ||
pod, ok := c.getPod(identifier) | ||
if !ok { | ||
return nil, false | ||
} | ||
attributes := make(map[string]string, len(pod.Attributes)) | ||
for key, value := range pod.Attributes { | ||
attributes[key] = value | ||
} | ||
for key, value := range c.getPodOwnerMetadataAttributes(pod) { | ||
attributes[key] = value | ||
} | ||
observability.RecordIPLookupMiss() | ||
return nil, false | ||
return attributes, ok |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we pass target map to this function so we avoid double copying?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not unless we can show a performance increase. Doing it this way is easier to understand, and I'd be surprised if copying like 16 strings twice makes any significant difference.
Either way, for this fix I'd rather keep things simple, we can revisit this later.
a742efd
to
e5dac4f
Compare
e5dac4f
to
8fa9147
Compare
I refactored the locking a bit and I believe it's now safe. We should rethink how it's meant to work in general though. The code acts internally as if passing around a |
ownerAttributes := c.getPodOwnerMetadataAttributes(pod) | ||
|
||
// we need to take a lock here because pod.Attributes may be modified concurrently | ||
// TODO: clean up the locking in these functions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one and the ones it calls, made that clearer in the comment.
8fa9147
to
b28fda2
Compare
We had a race condition where we'd update Pod metadata in
GetPod
while only holding a read lock. In circumstances where two batches of data would arrive from the same Pod in quick succession, we'd get a panic due to concurrent map writes.I fixed this by changing the interface to only return an attributes map and explicitly making that a copy of what's in the Pod attributes.