Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(k8sprocessor): keep pod's services information up to date #710

Conversation

aboguszewski-sumo
Copy link
Contributor

Information about pod's services can arrive late, i.e. after the pod has been marked as ready. If that happens, the data won't be marked with this information until another update of that pod.
This PR changes this behavior - information about services is no longer cached and is recalculated every time when GetPod is called.
This is a less intrusive solution for this problem - an alternative is to remove the whole map[string]*Pod from WatchClient and recalculate all the attributes every time.

cc: @swiatekm-sumo

@aboguszewski-sumo aboguszewski-sumo requested a review from a team as a code owner August 22, 2022 07:37
@github-actions github-actions bot added documentation Improvements or additions to documentation go labels Aug 22, 2022
@aboguszewski-sumo aboguszewski-sumo force-pushed the k8s-metadata-update-service branch from 6b2b5dd to e64808c Compare August 22, 2022 07:38
Comment on lines 255 to 257
} else {
c.updatePodServices(pod)
return pod, ok
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would put it outside of if-else

@@ -380,6 +377,14 @@ func (c *WatchClient) extractPodAttributes(pod *api_v1.Pod) map[string]string {
return tags
}

func (c *WatchClient) updatePodServices(pod *Pod) {
if c.Rules.ServiceName {
if services := c.op.GetServices(pod.Name); len(services) > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there is no services, we should set empty string or remove the attribute

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is for scenario when the service has been removed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity. can we shorten it to just

pod.Attributes[c.Rules.Tags.ServiceName] = strings.Join(services, c.delimiter)

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the implementation, yes: https://cs.opensource.google/go/go/+/refs/tags/go1.19:src/strings/strings.go;l=430
I did not change it, because I was unsure if such way of coding is idiomatic in Go. But I also like it better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would go with it

@aboguszewski-sumo aboguszewski-sumo force-pushed the k8s-metadata-update-service branch 3 times, most recently from 105272f to aae8774 Compare August 22, 2022 11:25
@swiatekm
Copy link

swiatekm commented Aug 22, 2022

What do you think of doing this for all owner metadata?

As in, we'd skip calculating owner metadata on Pod updates, and only calculate it in GetPod on demand. We'd need to check of how this affects performance, but it'd make the code simpler and use less memory.

@aboguszewski-sumo
Copy link
Contributor Author

What do you think of doing this for all owner metadata?

As in, we'd skip calculating owner metadata on Pod updates, and only calculate it in GetPod on demand. We'd need to check of how this affects performance, but it'd make the code simpler and use less memory.

Actually, this is a question of "where do we draw the line". We can do this only for service data, only for owner metadata (including services) or even for all data. However, removing this feature completely would require some more changes.

However, are you sure you mean only owner metadata? There is not that much owner metadata, so I don't think this would affect performance and memory usage.

In terms of performance, are there any ready benchmarks or tools for making them?

@swiatekm
Copy link

What do you think of doing this for all owner metadata?
As in, we'd skip calculating owner metadata on Pod updates, and only calculate it in GetPod on demand. We'd need to check of how this affects performance, but it'd make the code simpler and use less memory.

Actually, this is a question of "where do we draw the line". We can do this only for service data, only for owner metadata (including services) or even for all data. However, removing this feature completely would require some more changes.

However, are you sure you mean only owner metadata? There is not that much owner metadata, so I don't think this would affect performance and memory usage.

I'd like to do it for all owner metadata. Singling out Services is a bit arbitrary, and I think doing a write on every records as we are in this PR is going to be slower than calculating the tags on the fly.

I'm actually not that concerned about performance - I primarily want to do this to simplify the code. We should check what the peformance impact is, but I don't expect it to be much either way.

In terms of performance, are there any ready benchmarks or tools for making them?

There's some commented out benchmarks here:

//func newBenchmarkClient(b *testing.B) *WatchClient {
.

Generally speaking, what you'd want is a test where we seed the client with, say, 1000 Pods and then benchmark GetPod.

@aboguszewski-sumo
Copy link
Contributor Author

aboguszewski-sumo commented Aug 23, 2022

I ran it against 1000 pods, like you suggested.

Five consecutive runs before changes (current main):

BenchmarkSomePodsPerOwner-16              167499              6629 ns/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube   2.485s

BenchmarkSomePodsPerOwner-16              163314              6636 ns/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube   2.446s

BenchmarkSomePodsPerOwner-16              175059              6681 ns/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube   3.460s

BenchmarkSomePodsPerOwner-16              181920              6979 ns/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube   2.648s

BenchmarkSomePodsPerOwner-16              180145              7417 ns/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube   2.702s

Four consecutive runs after changes (I was sure I put 5 in here, but one got lost somehow...):

BenchmarkSomePodsPerOwner-16              164652              7840 ns/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube   2.712s

BenchmarkSomePodsPerOwner-16              157642              8150 ns/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube   3.785s

BenchmarkSomePodsPerOwner-16              150303              8949 ns/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube   2.758s

BenchmarkSomePodsPerOwner-16              148719              7408 ns/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube   2.487s

@aboguszewski-sumo
Copy link
Contributor Author

The benchmarks have been run on the newest commit (with added adding all owner metadata to a pod)

@swiatekm
Copy link

Hm, so around 20% more CPU time per operation on average, it looks like. @SumoLogic/open-source-collection-team WDYT, is this worth the added reliability? Or should we try to keep the more efficient approach, and make it more complex to account for the Owner updates?

@sumo-drosiek
Copy link
Contributor

20% is something. Will the more complex approach be much more effective?

@aboguszewski-sumo
Copy link
Contributor Author

aboguszewski-sumo commented Aug 23, 2022

I'm sorry, I have made a mistake while running the benchmarks (OwnerLookupEnabled was not set to true and I was doing other tasks in background when measuring). New results:

Current main:

BenchmarkSomePodsPerOwner-16              146277              7815 ns/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube   2.594s

BenchmarkSomePodsPerOwner-16              160446              7641 ns/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube   2.674s

BenchmarkSomePodsPerOwner-16              145642              7674 ns/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube   2.563s

BenchmarkSomePodsPerOwner-16              169126              7625 ns/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube   2.719s

HEAD~1 of this branch (update only services in GetPod):

BenchmarkSomePodsPerOwner-16              155786              7725 ns/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube   2.646s

BenchmarkSomePodsPerOwner-16              158461              7674 ns/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube   2.668s

BenchmarkSomePodsPerOwner-16              147369              7875 ns/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube   2.579s

BenchmarkSomePodsPerOwner-16              154473              7779 ns/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube   2.625s

HEAD of this branch (update all owner metadata):

BenchmarkSomePodsPerOwner-16              136042              8100 ns/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube   2.546s

BenchmarkSomePodsPerOwner-16              142065              8193 ns/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube   2.556s

BenchmarkSomePodsPerOwner-16              149042              8599 ns/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube   2.709s

BenchmarkSomePodsPerOwner-16              146815              8300 ns/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube   2.637s

@swiatekm
Copy link

So more like 5-10%, then.

@aboguszewski-sumo aboguszewski-sumo force-pushed the k8s-metadata-update-service branch from 7b94dcf to f66084c Compare September 23, 2022 10:40
Info about pod's metadata can arrive late, i.e. the service info can arrive
after the pod has been marked as ready.
This commit removes caching this information and updates it with every query instead.

Translate all owner metadata
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation go
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants