Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not assume LZ4 worst (best?) case 255x compression #1446

Merged
merged 4 commits into from
Oct 11, 2017

Conversation

tudor
Copy link
Contributor

@tudor tudor commented Oct 8, 2017

Fix #1445

Self-explanatory, see the issue description for details.

cc: @igorcanadi

Copy link
Contributor

@edenhill edenhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff, needs some tweaking.

* More info on max size: http://stackoverflow.com/a/25751871/1821055 */
if (fi.contentSize == 0 || fi.contentSize > inlen * 255)
estimated_uncompressed_size = inlen * 255;
estimated_uncompressed_size = inlen * 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit conservative.

Maybe we should make use of of message.max.bytes as a starting point?
estimated_uncompressed_size = RD_MIN(inlen * 255, RD_MAX(inlen * 2, rkb->rkb_rk->rk_conf.max_msg_size));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

char *tmp;
size_t extra = (r > 1024 ? r : 1024) * 2;
// Grow exponentially with some factor > 1 (using 1.75)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C-style comments `/* .. */

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will change.

size_t extra = (r > 1024 ? r : 1024) * 2;
// Grow exponentially with some factor > 1 (using 1.75)
// for ammortized O(1) copying.
size_t extra = outlen * 3 / 4;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LZ4F_decompress() returns the number of bytes written for the next call, so we should still use r here.

Please retain the original factor of 2 and the min 1024 check unless there is clear motivation why it needs to be changed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it returns the number of compressed bytes it expects for the next call, not the number of decompressed bytes it guesses will return on the next call. See lz4frame.h:

 * @return is an hint of how many `srcSize` bytes LZ4F_decompress() expects for next call.
 * Schematically, it's the size of the current (or remaining) compressed block + header of next block.
 * Respecting the hint provides some small speed benefit, because it skips intermediate buffers.
 * This is just a hint though, it's always possible to provide any srcSize.
 * When a frame is fully decoded, @return will be 0 (no more data expected).
 * If decompression failed, @return is an error code, which can be tested using LZ4F_isError().

Please retain the original factor of 2

I'm not a huge fan of a factor of 2 for the reasons described here: https://github.com/facebook/folly/blob/master/folly/docs/FBVector.md

Chiefly, any factor > 1 will ensure amortized O(1) copying behavior, but a factor of 2 is worst for memory fragmentation, because the memory allocator has no chance at reusing the previously allocated blocks.

and the min 1024

Sure, my mistake. I'd misread that as a a max of 1024 when I first read the code.


rd_atomic64_add(&rkb->rkb_c.zbuf_grow, 1);

if ((tmp = rd_realloc(outbuf, outlen + extra))) {
if (!(tmp = rd_realloc(out, outlen + extra))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

inlen * 255,
RD_MAX(inlen * 2,
(size_t)(rkb->rkb_rk->rk_conf.max_msg_size)));
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

D'oh. Fixing now.

@edenhill edenhill merged commit 28962b7 into confluentinc:master Oct 11, 2017
@edenhill
Copy link
Contributor

Thanks alot for this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants