-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add stateless dictionary support #216
Conversation
Enable up to 8K dictionary for stateless compression.
I can't use this in WebSockets due to the 8 KB limit unfortunately since I need to support at least the default 32 KB rolling window. I suppose I could fallback. 🤔 |
Why? 8K should work just as well as 32K. |
Not all clients support configurable window sizes. E.g. every other Go WebSocket library does not since the standard library does not allow configuring it. |
It's possible I'm confused but as far as I understand, the window size must be the same on both client/server, so the truncation that would be performed here wouldn't work. |
I think you may be misunderstanding what the dictionary is for. It is purely there for efficiency, and the compressor can have any history size. It is purely for better compression. You can use the current stateless implementation, and it will work fine with any client. It will just compress slightly worse. |
https://godoc.org/compress/flate#NewWriterDict
Why do the docs state that if the dictionary is purely for efficiency? |
That is if it is a pre-shared dictionary, meaning something that is initialized before the stream starts. The decoder must have the dictionary, since the encoder may reference it. But the encoder does not have to reference the dictionary at all or only parts of it, which is why an 8K is fine even if the decoder buffers up to 32K. For blocks, the entire 32K will be referenced because the dictionary is a sliding window from the current decompression point. You are just limiting how far back the encoder can reference data. The decoder must be able to handle references 32K back, but from the point of the compressor it is just a matter of efficiency how far back to reference. |
In terms of the documentation, what it means is that the decoder must have the dictionary to decompress the data correctly. |
Ahh makes perfect sense. Will test in my lib this weekend. |
@nhooyr Have you had a chance to test it? |
Thanks for reminding me, just finally finished the rewrite with compression. Here's the commit in which I add your library. All my tests pass including the entire autobahn test suite with race detection enabled so it definitely works well. I cannot comment on performance yet as I haven't written benchmarks just yet. As soon as I do, I'll let you know. |
The one thing I was curious about is why it doesn't return the length of bytes written. I need this information to accurately implement io.Writer if an error occurs as then it's possible not all of |
Updated commit: coder/websocket@7dbe93c |
@nhooyr Cool. I will take a peek :)
It doesn't really make much sense to keep track of that, since there isn't a 1:1 mapping between input and output. x bytes in will map to y bytes output and a partial write will result in garbage data. Most writers consider a failed write as unrecoverable, including the stdlib which persists the error. If you want to keep track of it you can buffer the output, so writes cannot fail. But in general I find that trying to recover a broken write is very unreliable at best, but I'll leave that to you. |
Fair, noticed the stdlib writer just returns n = 0 if any error occurs. |
Why don't we just encapsulate that into StatelessDeflate and have it return |
I'm not sure what problem you are trying to solve, but there is also NewStatelessWriter if you want regular Write calls. |
See #222 |
My benchmarks indicate your library is 2x faster than the stdlib which is sweet. For non stateless compression, it seems to allocate 2x though. See #107 (comment) In regards to the stateless compression, I'm seeing 19301 B being allocated per op in my benchmark for echoing a 512 random byte message with a 8 KB dictionary. With no dictionary, it's like 100 B per op. I'm not sure how much memory it's using internally between calls, there doesn't seem to be an easy way to measure as it's using a pool to keep the memory usage down. |
Disabled the bitWriterPool pool and I'm seeing 36931 B/op which seems really good, too good to be true. Is there any other pool I need to disable? |
@nhooyr The actual allocations depend on whether the content is expected to be compressed. You are probably hitting this: Line 108 in b7ccab8
This is early bailout if content cannot be compressed in which case no huffman tables are generated and content is just written uncompressed. |
So I changed my benchmark to use |
Although no dictionary now allocates 15 KB/op. |
Nvm ignore last comment, was due to the bit writer pool being disabled.
On https://github.com/nhooyr/websocket/tree/compress and with the bit writer pool enabled in klauspost/compress. |
It's also definitely compressing, I'm seeing 16 total bytes written (including the websocket header) for the 512 byte message. |
@nhooyr Did you ever doubt that :D |
Honestly did, the memory being allocated is impressively low. |
It can use the stack since it is stateless, and other tricks since it is only really useful for smaller blocks. It is (of course) quite inefficient, but something is better than nothing and speed is decent. Most important is of course that it doesn't keep memory between calls. |
Enable up to 8K dictionary for stateless compression.