Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deframer implementation resizes buffer when not necessary #211

Closed
bugreport1234 opened this issue Jan 8, 2019 · 2 comments
Closed

Deframer implementation resizes buffer when not necessary #211

bugreport1234 opened this issue Jan 8, 2019 · 2 comments

Comments

@bugreport1234
Copy link

Hello,
I'm using this library in a project. First of all, thanks, this library has made working with TLS very straightforward for me.

I'm using this library in the context of non-blocking I/O. Due to the specifics of my project I'm unable to use an OS polling library like mio, and instead manually check sockets for readiness in a loop, sleeping for a short period between iterations. This means that the vast majority of read calls return WouldBlock. This has worked well with unencrypted TcpStreams, however once I implemented an encrypted version with this library I started seeing unreasonable CPU usage even though the connection was idle.

I've been able to modify the simpleclient.rs example to serve as a minimal testcase: https://pastebin.com/S0SPraDN. Executing this code results in consistent CPU usage of ~2% for me.

Upon investigation, it appears the code in deframer.rs#L45 first resizes the deframer buffer to maximum length, attempts a read, and truncates the buffer back to its original size if an error occurred. It seems these operations were the cause of the CPU usage I was seeing, possibly because of the allocation size of >18KB. Unfortunately these operations execute in almost every iteration of my loop, due to the returned WouldBlock from the underlying socket.

I've been able to confirm that these operations are indeed the cause of the CPU usage through a workaround. If you insert the following at the start of the read function the CPU usage drops back to 0%:

let mut empty_buf = [0; 0];
rd.read(&mut empty_buf)?;

The code attempts to read to an empty buffer which means it shouldn't affect the functionality of the success case. In the error case, the function immediately returns without getting to the resizing part, thus avoiding the costly operations.

However I feel like it should be possible to solve this without a workaround like this one. From PR #9 and PR #11 I gather that the current behavior and the large buffer size were introduced to avoid too many read calls to make reading more efficient. Perhaps std::io::BufReader could be used here, which seems to have been made to solve this problem, and appears not to be using resize internally.
However, I have no experience with either TLS nor BufReader internals, so I apologize if this would not be a workable solution.

@ctz
Copy link
Member

ctz commented Apr 21, 2019

Thanks for the report and repro program. I think BufReader would work here, but only once rust-lang/rust/pull/49139 is landed. In the mean time, I'll keep the vec max length and track the valid prefix here.

@ctz
Copy link
Member

ctz commented Apr 22, 2019

before:

read_tls with EWOULDBLOCK                                                                             
                        time:   [1.0132 us 1.0180 us 1.0228 us]

after:

read_tls with EWOULDBLOCK                                                                             
                        time:   [10.609 ns 10.668 ns 10.729 ns]
                        change: [-98.966% -98.955% -98.943%] (p = 0.00 < 0.05)
                        Performance has improved.

@ctz ctz closed this as completed Apr 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants