Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Amortize allocations in snappyDecode #446

Merged
merged 1 commit into from
May 8, 2015

Conversation

potocnyj
Copy link

@potocnyj potocnyj commented May 8, 2015

Profiling the Kafka consumers at VividCortex showed me that a large portion of the allocated memory in our code comes from sarama.snappyDecode. This change reduces the average number of allocated bytes by pre-allocating the destination slice, and by re-using the chunk slice for cases that make multiple calls to snappy.Decode. Doing so should reduce GC pressure by a modest amount.

Profiling the Kafka consumers at VividCortex showed me that a large portion of the allocated memory in our code comes from sarama.snappyDecode. This change reduces the average number of allocated bytes by pre-allocating the destination slice, and by re-using the chunk slice for cases that make multiple calls to snappy.Decode. Doing so reduces GC pressure by a modest amount.
@eapache
Copy link
Contributor

eapache commented May 8, 2015

Makes sense to me. I'd be curious to see the actual before/after benchmark results if you can share those?

@potocnyj
Copy link
Author

potocnyj commented May 8, 2015

The numbers I have so far from local testing are fairly noisy, I'd be happy to collect some numbers from our production env and report back later though

@potocnyj
Copy link
Author

potocnyj commented May 8, 2015

Right, so here are some numbers on patched vs. un-patched on one of our production consumers:

  • snappyDecode went from being 43.9% of total alloc'd_space to 33.9%, with the underlying snappy.Decode call dropping from 6.2% to 0.7%
  • On the cpu side, snappyDecode went from 6.6% of cpu to 4.8%. runtime.Growslice and runtime.Makeslice have decreases that appear to make up this difference.
  • bgsweep dropped from using 7.8% to 6.7% of cpu, and runtime.sweepone went from 9.7% to 7.8% of total cpu

@eapache
Copy link
Contributor

eapache commented May 8, 2015

👍

eapache added a commit that referenced this pull request May 8, 2015
@eapache eapache merged commit 440808c into IBM:master May 8, 2015
@wvanbergen
Copy link
Contributor

Awesome improvement!!

On Friday, May 8, 2015, Evan Huus [email protected] wrote:

Merged #446 #446.


Reply to this email directly or view it on GitHub
#446 (comment).

@eapache
Copy link
Contributor

eapache commented May 29, 2015

Definitely awesome! After rolling this out to our cluster a little while ago, we've seen a distinct ~15% reduction in memory usage across our producers. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants