Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More efficient cachedPartitions ideas #286

Closed
cep21 opened this issue Feb 20, 2015 · 4 comments
Closed

More efficient cachedPartitions ideas #286

cep21 opened this issue Feb 20, 2015 · 4 comments

Comments

@cep21
Copy link

cep21 commented Feb 20, 2015

Hello,

I've written a golang kafka client using sarama (it sends a lot of kafka messages) and when I use pprof the following information shows up

(pprof) top100
670ms of 670ms total (  100%)
      flat  flat%   sum%        cum   cum%
     220ms 32.84% 32.84%      220ms 32.84%  github.com/Shopify/sarama.(*int32Slice).Less
      90ms 13.43% 46.27%      370ms 55.22%  sort.doPivot
      50ms  7.46% 53.73%       50ms  7.46%  github.com/Shopify/sarama.(*int32Slice).Swap
      40ms  5.97% 59.70%      520ms 77.61%  github.com/Shopify/sarama.(*Client).cachedPartitions
      40ms  5.97% 65.67%       60ms  8.96%  runtime.mapiternext
      40ms  5.97% 71.64%       50ms  7.46%  sort.insertionSort
      30ms  4.48% 76.12%       30ms  4.48%  runtime.writebarrierptr
      20ms  2.99% 79.10%       20ms  2.99%  runtime.MSpan_Sweep
      20ms  2.99% 82.09%       20ms  2.99%  runtime.cas
      10ms  1.49% 83.58%       10ms  1.49%  MCentral_Grow
      10ms  1.49% 85.07%       10ms  1.49%  encoding/json.cachedTypeFields
...
...
...

I appear to spend a lot of time sorting the partitions inside cachedPartitions() Does this need to happen so frequently? Can this be cached per topic/partitionSet combination? I'm not super familiar with kafka's protocol so I'm not sure if the solution is so simple.

Thoughts?

@eapache
Copy link
Contributor

eapache commented Feb 20, 2015

Yes, this can be cached, it's just a little tricky and there are other performance bottlenecks that are more broadly applicable (e.g. #255). You must have an enormous number of partitions for this to be showing up so heavily.

@cep21
Copy link
Author

cep21 commented Feb 20, 2015

1024 partitions

@eapache
Copy link
Contributor

eapache commented Feb 20, 2015

That's a lot :)

Patches welcome, if you want to take a crack. In short, you'll have to:

  • add an additional map to use as a cache that works in tandem with metadata at client.go:67
  • use and/or lazily fill the cache in Client.cachedPartitions() being careful to respect the partitionSet parameter
  • clear the cache around client.go:586

Otherwise one of us will probably pick this up eventually, but it's not a high priority at the moment.

@cep21
Copy link
Author

cep21 commented Feb 20, 2015

Thank you for the comments. I've created pull request #287 using your idea.

@cep21 cep21 closed this as completed Feb 23, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants