-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not assume LZ4 worst (best?) case 255x compression #1446
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good stuff, needs some tweaking.
src/rdkafka_lz4.c
Outdated
* More info on max size: http://stackoverflow.com/a/25751871/1821055 */ | ||
if (fi.contentSize == 0 || fi.contentSize > inlen * 255) | ||
estimated_uncompressed_size = inlen * 255; | ||
estimated_uncompressed_size = inlen * 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a bit conservative.
Maybe we should make use of of message.max.bytes
as a starting point?
estimated_uncompressed_size = RD_MIN(inlen * 255, RD_MAX(inlen * 2, rkb->rkb_rk->rk_conf.max_msg_size));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
src/rdkafka_lz4.c
Outdated
char *tmp; | ||
size_t extra = (r > 1024 ? r : 1024) * 2; | ||
// Grow exponentially with some factor > 1 (using 1.75) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C-style comments `/* .. */
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will change.
src/rdkafka_lz4.c
Outdated
size_t extra = (r > 1024 ? r : 1024) * 2; | ||
// Grow exponentially with some factor > 1 (using 1.75) | ||
// for ammortized O(1) copying. | ||
size_t extra = outlen * 3 / 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LZ4F_decompress()
returns the number of bytes written for the next call, so we should still use r
here.
Please retain the original factor of 2 and the min 1024
check unless there is clear motivation why it needs to be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it returns the number of compressed bytes it expects for the next call, not the number of decompressed bytes it guesses will return on the next call. See lz4frame.h
:
* @return is an hint of how many `srcSize` bytes LZ4F_decompress() expects for next call.
* Schematically, it's the size of the current (or remaining) compressed block + header of next block.
* Respecting the hint provides some small speed benefit, because it skips intermediate buffers.
* This is just a hint though, it's always possible to provide any srcSize.
* When a frame is fully decoded, @return will be 0 (no more data expected).
* If decompression failed, @return is an error code, which can be tested using LZ4F_isError().
Please retain the original factor of 2
I'm not a huge fan of a factor of 2 for the reasons described here: https://github.com/facebook/folly/blob/master/folly/docs/FBVector.md
Chiefly, any factor > 1 will ensure amortized O(1) copying behavior, but a factor of 2 is worst for memory fragmentation, because the memory allocator has no chance at reusing the previously allocated blocks.
and the min
1024
Sure, my mistake. I'd misread that as a a max of 1024 when I first read the code.
|
||
rd_atomic64_add(&rkb->rkb_c.zbuf_grow, 1); | ||
|
||
if ((tmp = rd_realloc(outbuf, outlen + extra))) { | ||
if (!(tmp = rd_realloc(out, outlen + extra))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
src/rdkafka_lz4.c
Outdated
inlen * 255, | ||
RD_MAX(inlen * 2, | ||
(size_t)(rkb->rkb_rk->rk_conf.max_msg_size))); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
D'oh. Fixing now.
Thanks alot for this! |
Fix #1445
Self-explanatory, see the issue description for details.
cc: @igorcanadi