fix(k8sprocessor): keep pod's services information up to date #710

aboguszewski-sumo · 2022-08-22T07:37:31Z

Information about pod's services can arrive late, i.e. after the pod has been marked as ready. If that happens, the data won't be marked with this information until another update of that pod.
This PR changes this behavior - information about services is no longer cached and is recalculated every time when GetPod is called.
This is a less intrusive solution for this problem - an alternative is to remove the whole map[string]*Pod from WatchClient and recalculate all the attributes every time.

cc: @swiatekm-sumo

sumo-drosiek · 2022-08-22T08:04:00Z

pkg/processor/k8sprocessor/kube/client.go

+		} else {
+			c.updatePodServices(pod)
+			return pod, ok


i would put it outside of if-else

sumo-drosiek · 2022-08-22T08:05:18Z

pkg/processor/k8sprocessor/kube/client.go

@@ -380,6 +377,14 @@ func (c *WatchClient) extractPodAttributes(pod *api_v1.Pod) map[string]string {
 	return tags
 }

+func (c *WatchClient) updatePodServices(pod *Pod) {
+	if c.Rules.ServiceName {
+		if services := c.op.GetServices(pod.Name); len(services) > 0 {


if there is no services, we should set empty string or remove the attribute

this is for scenario when the service has been removed

Out of curiosity. can we shorten it to just

pod.Attributes[c.Rules.Tags.ServiceName] = strings.Join(services, c.delimiter)

?

According to the implementation, yes: https://cs.opensource.google/go/go/+/refs/tags/go1.19:src/strings/strings.go;l=430
I did not change it, because I was unsure if such way of coding is idiomatic in Go. But I also like it better.

I would go with it

swiatekm · 2022-08-22T11:57:41Z

What do you think of doing this for all owner metadata?

As in, we'd skip calculating owner metadata on Pod updates, and only calculate it in GetPod on demand. We'd need to check of how this affects performance, but it'd make the code simpler and use less memory.

aboguszewski-sumo · 2022-08-22T12:37:03Z

What do you think of doing this for all owner metadata?

As in, we'd skip calculating owner metadata on Pod updates, and only calculate it in GetPod on demand. We'd need to check of how this affects performance, but it'd make the code simpler and use less memory.

Actually, this is a question of "where do we draw the line". We can do this only for service data, only for owner metadata (including services) or even for all data. However, removing this feature completely would require some more changes.

However, are you sure you mean only owner metadata? There is not that much owner metadata, so I don't think this would affect performance and memory usage.

In terms of performance, are there any ready benchmarks or tools for making them?

swiatekm · 2022-08-23T10:55:00Z

What do you think of doing this for all owner metadata?
As in, we'd skip calculating owner metadata on Pod updates, and only calculate it in GetPod on demand. We'd need to check of how this affects performance, but it'd make the code simpler and use less memory.

Actually, this is a question of "where do we draw the line". We can do this only for service data, only for owner metadata (including services) or even for all data. However, removing this feature completely would require some more changes.

However, are you sure you mean only owner metadata? There is not that much owner metadata, so I don't think this would affect performance and memory usage.

I'd like to do it for all owner metadata. Singling out Services is a bit arbitrary, and I think doing a write on every records as we are in this PR is going to be slower than calculating the tags on the fly.

I'm actually not that concerned about performance - I primarily want to do this to simplify the code. We should check what the peformance impact is, but I don't expect it to be much either way.

In terms of performance, are there any ready benchmarks or tools for making them?

There's some commented out benchmarks here:

sumologic-otel-collector/pkg/processor/k8sprocessor/kube/client_test.go

Line 1175 in 138cfa9

//func newBenchmarkClient(b *testing.B) *WatchClient {

.

Generally speaking, what you'd want is a test where we seed the client with, say, 1000 Pods and then benchmark GetPod.

aboguszewski-sumo · 2022-08-23T12:20:28Z

I ran it against 1000 pods, like you suggested.

Five consecutive runs before changes (current main):

BenchmarkSomePodsPerOwner-16              167499              6629 ns/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube   2.485s

BenchmarkSomePodsPerOwner-16              163314              6636 ns/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube   2.446s

BenchmarkSomePodsPerOwner-16              175059              6681 ns/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube   3.460s

BenchmarkSomePodsPerOwner-16              181920              6979 ns/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube   2.648s

BenchmarkSomePodsPerOwner-16              180145              7417 ns/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube   2.702s

Four consecutive runs after changes (I was sure I put 5 in here, but one got lost somehow...):

BenchmarkSomePodsPerOwner-16              164652              7840 ns/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube   2.712s

BenchmarkSomePodsPerOwner-16              157642              8150 ns/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube   3.785s

BenchmarkSomePodsPerOwner-16              150303              8949 ns/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube   2.758s

BenchmarkSomePodsPerOwner-16              148719              7408 ns/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube   2.487s

aboguszewski-sumo · 2022-08-23T12:28:11Z

The benchmarks have been run on the newest commit (with added adding all owner metadata to a pod)

swiatekm · 2022-08-23T13:14:08Z

Hm, so around 20% more CPU time per operation on average, it looks like. @SumoLogic/open-source-collection-team WDYT, is this worth the added reliability? Or should we try to keep the more efficient approach, and make it more complex to account for the Owner updates?

sumo-drosiek · 2022-08-23T13:26:01Z

20% is something. Will the more complex approach be much more effective?

aboguszewski-sumo · 2022-08-23T14:02:38Z

I'm sorry, I have made a mistake while running the benchmarks (OwnerLookupEnabled was not set to true and I was doing other tasks in background when measuring). New results:

Current main:

BenchmarkSomePodsPerOwner-16              146277              7815 ns/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube   2.594s

BenchmarkSomePodsPerOwner-16              160446              7641 ns/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube   2.674s

BenchmarkSomePodsPerOwner-16              145642              7674 ns/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube   2.563s

BenchmarkSomePodsPerOwner-16              169126              7625 ns/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube   2.719s

HEAD~1 of this branch (update only services in GetPod):

BenchmarkSomePodsPerOwner-16              155786              7725 ns/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube   2.646s

BenchmarkSomePodsPerOwner-16              158461              7674 ns/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube   2.668s

BenchmarkSomePodsPerOwner-16              147369              7875 ns/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube   2.579s

BenchmarkSomePodsPerOwner-16              154473              7779 ns/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube   2.625s

HEAD of this branch (update all owner metadata):

BenchmarkSomePodsPerOwner-16              136042              8100 ns/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube   2.546s

BenchmarkSomePodsPerOwner-16              142065              8193 ns/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube   2.556s

BenchmarkSomePodsPerOwner-16              149042              8599 ns/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube   2.709s

BenchmarkSomePodsPerOwner-16              146815              8300 ns/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube   2.637s

swiatekm · 2022-08-23T18:12:56Z

So more like 5-10%, then.

Info about pod's metadata can arrive late, i.e. the service info can arrive after the pod has been marked as ready. This commit removes caching this information and updates it with every query instead. Translate all owner metadata

aboguszewski-sumo requested a review from a team as a code owner August 22, 2022 07:37

github-actions bot added documentation Improvements or additions to documentation go labels Aug 22, 2022

aboguszewski-sumo force-pushed the k8s-metadata-update-service branch from 6b2b5dd to e64808c Compare August 22, 2022 07:38

sumo-drosiek reviewed Aug 22, 2022

View reviewed changes

aboguszewski-sumo force-pushed the k8s-metadata-update-service branch 3 times, most recently from 105272f to aae8774 Compare August 22, 2022 11:25

sumo-drosiek approved these changes Aug 22, 2022

View reviewed changes

aboguszewski-sumo force-pushed the k8s-metadata-update-service branch from 7b94dcf to f66084c Compare September 23, 2022 10:40

fix(k8sprocessor): keep pod's owner metadata up to date

e50f21f

Info about pod's metadata can arrive late, i.e. the service info can arrive after the pod has been marked as ready. This commit removes caching this information and updates it with every query instead. Translate all owner metadata

aboguszewski-sumo force-pushed the k8s-metadata-update-service branch from f66084c to e50f21f Compare September 23, 2022 11:25

aboguszewski-sumo merged commit 19c766e into SumoLogic:main Sep 23, 2022

swiatekm mentioned this pull request Feb 2, 2023

WIP: fix(k8sprocessor): add service attributes immediately after endpoints are created #384

Closed

swiatekm mentioned this pull request Sep 8, 2023

[k8sprocessor] Owner metadata should have a grace period after deletion #1242

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(k8sprocessor): keep pod's services information up to date #710

fix(k8sprocessor): keep pod's services information up to date #710

aboguszewski-sumo commented Aug 22, 2022

sumo-drosiek Aug 22, 2022

sumo-drosiek Aug 22, 2022

sumo-drosiek Aug 22, 2022

sumo-drosiek Aug 22, 2022

aboguszewski-sumo Aug 22, 2022

sumo-drosiek Aug 22, 2022

swiatekm commented Aug 22, 2022 •

edited

Loading

aboguszewski-sumo commented Aug 22, 2022

swiatekm commented Aug 23, 2022

aboguszewski-sumo commented Aug 23, 2022 •

edited

Loading

aboguszewski-sumo commented Aug 23, 2022

swiatekm commented Aug 23, 2022

sumo-drosiek commented Aug 23, 2022

aboguszewski-sumo commented Aug 23, 2022 •

edited

Loading

swiatekm commented Aug 23, 2022

fix(k8sprocessor): keep pod's services information up to date #710

fix(k8sprocessor): keep pod's services information up to date #710

Conversation

aboguszewski-sumo commented Aug 22, 2022

sumo-drosiek Aug 22, 2022

Choose a reason for hiding this comment

sumo-drosiek Aug 22, 2022

Choose a reason for hiding this comment

sumo-drosiek Aug 22, 2022

Choose a reason for hiding this comment

sumo-drosiek Aug 22, 2022

Choose a reason for hiding this comment

aboguszewski-sumo Aug 22, 2022

Choose a reason for hiding this comment

sumo-drosiek Aug 22, 2022

Choose a reason for hiding this comment

swiatekm commented Aug 22, 2022 • edited Loading

aboguszewski-sumo commented Aug 22, 2022

swiatekm commented Aug 23, 2022

aboguszewski-sumo commented Aug 23, 2022 • edited Loading

aboguszewski-sumo commented Aug 23, 2022

swiatekm commented Aug 23, 2022

sumo-drosiek commented Aug 23, 2022

aboguszewski-sumo commented Aug 23, 2022 • edited Loading

swiatekm commented Aug 23, 2022

swiatekm commented Aug 22, 2022 •

edited

Loading

aboguszewski-sumo commented Aug 23, 2022 •

edited

Loading

aboguszewski-sumo commented Aug 23, 2022 •

edited

Loading