Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crypto: add HMAC to crypto.timingSafeEqual() #38488

Closed
wants to merge 1 commit into from
Closed

Conversation

Trott
Copy link
Member

@Trott Trott commented Apr 30, 2021

Add HMAC to crypto.timingSafeEqual(). This makes things slower but also
makes them far more timing safe.

Refs: #38226 (comment)
Fixes: #38226

@Trott Trott added the wip Issues and PRs that are still a work in progress. label Apr 30, 2021
@github-actions github-actions bot added c++ Issues and PRs that require attention from people who are familiar with C++. crypto Issues and PRs related to the crypto subsystem. needs-ci PRs that need a full CI run. labels Apr 30, 2021
@Trott Trott force-pushed the hmacify branch 2 times, most recently from f2a0d3a to a080a53 Compare April 30, 2021 19:59
@nodejs-github-bot
Copy link
Collaborator

@Trott
Copy link
Member Author

Trott commented Apr 30, 2021

Here's a stress test of this PR: https://ci.nodejs.org/job/node-stress-single-test/294/

And here's one for current master branch: https://ci.nodejs.org/job/node-stress-single-test/295/

@Trott Trott force-pushed the hmacify branch 2 times, most recently from feede9c to b3c3a0c Compare April 30, 2021 20:34
@Trott Trott removed the wip Issues and PRs that are still a work in progress. label Apr 30, 2021
@Trott Trott changed the title wip: add HMAC to crypto.timingSafeEqual() src: add HMAC to crypto.timingSafeEqual() Apr 30, 2021
@Trott Trott changed the title src: add HMAC to crypto.timingSafeEqual() crypto: add HMAC to crypto.timingSafeEqual() Apr 30, 2021
@Trott
Copy link
Member Author

Trott commented Apr 30, 2021

Stress test results are enough for me to say "This is a good thing." At least, it fixes test flakiness (beyond the tiny bit of flakiness inherent in a probabilistic test).

My C++ was never very good and it's only gotten rustier over the last several years, so apologies in advance.

@nodejs/crypto @not-an-aardvark

@panva
Copy link
Member

panva commented Apr 30, 2021

This makes things slower

How much slower? Is the existing implementation not timing safe enough? Should this be a semver-major? Or better yet introduced as a flag to the existing method?

Comment on lines 50 to 61
char key[kKeySize];
snprintf(key, sizeof(key), "%04x%04x%04x%04x%04x%04x%04x%04x",
bufKey[0],
bufKey[1],
bufKey[2],
bufKey[3],
bufKey[4],
bufKey[5],
bufKey[6],
bufKey[7]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just use a static random key here rather than generating one each time? The key can be generated randomly when the crypto binding is loaded and just reused.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just

Because I am not particularly fluent at C++ and copied this implementation from the uuid generation in our inspector code. Happy to take informed comments to improve it, but the answer to any "why not just" queries is going to be "Because I'm really bad at this."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I guess I should state, rather than imply: Thanks for the suggestion, and I'll look into doing that later. (Focused on something else at the moment, but not focused enough to ignore GitHub comments, go figure.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:-) ... ok, I just wanted to make sure there wasn't a specific technical reason. By avoiding generating the random key on every call we can save at least some of the additional performance cost introduced here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main advantage generating a random key every time is that it would become much harder for someone to get the function to return true as a result of a HMAC-sha256 collision, in the event that sha256 becomes less collision-resistant in the future.

As a secondary, smaller benefit, it also prevents an attacker from knowing either of the inputs to CRYPTO_memcmp. More explicitly, if the computation is CRYPTO_memcmp(sha256hmac(someKey, a), sha256hmac(someKey, b)) where someKey is a static constant, then an attacker could control at least the first few bytes of sha256hmac(someKey, a) by changing a via brute force. Given that this is intended as a defense-in-depth measure in the scenario that CRYPTO_memcmp is insufficient to address side-channels, in this scenario the attacker could use a timing side-channel to discover the first few bytes of sha256hmac(someKey, b). It's not clear how this would be useful in most cases, but it would give a nonzero amount of information about b.

Comment on lines +61 to +64
std::array<unsigned char, EVP_MAX_MD_SIZE> hash1;
std::array<unsigned char, EVP_MAX_MD_SIZE> hash2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just unsigned char hash1[EVP_MAX_MD_SIZE] here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I don't know the difference, so that works for me. 😬

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Afaik, C-style array lacks the higher-level functionalities of std::array, hence, they do not track their own size, so you need to manage size information manually. So for this particular case, I think it's better to keep std::array , since we keeps track of arrays own size.

@tniessen
Copy link
Member

If this really isn't timing-safe, isn't that a bug in CRYPTO_memcmp?

@Trott
Copy link
Member Author

Trott commented Apr 30, 2021

This makes things slower

How much slower?

We don't have any formal benchmarks so I'd have to write one to say with certainty.

Using test/pummel/test-crypto-timing-safe-equal-benchmarks.js as a proxy for a benchmark:

  • On LinuxONE CI, master takes about 2.5 seconds per run and this PR takes about 3.8 seconds per run.
  • On my 8-year-old laptop, master takes about 5 or 6 seconds and this PR takes about 14 or 15 seconds per run.

Is the existing implementation not timing safe enough?

Short answer is "Yes, not timing safe enough."

Longer answer: Unless there's a bug in test/pummel/test-crypto-timing-safe-equal-benchmarks.js, the answer would be "The current implementation fails a lot on some specific hosts in CI, so yes, not timing safe enough." You can see in https://ci.nodejs.org/job/node-stress-single-test/294/nodes=rhel7-s390x/console that with this implementation, the test passed 1000 times out of 1000 runs on LinuxONE (which is very fast in CI--might be our fastest CI host). In comparison, master branch (which you can see at https://ci.nodejs.org/job/node-stress-single-test/295/nodes=rhel7-s390x/console) is still running as of this writing but so far has failed more than 50% of the 700+ runs so far.

Should this be a semver-major? Or better yet introduced as a flag to the existing method?

Good questions. I'm interested in what others think.

@Trott
Copy link
Member Author

Trott commented Apr 30, 2021

If this really isn't timing-safe, isn't that a bug in CRYPTO_memcmp?

I assumed we're using the OpenSSL memcmp under the hood and didn't even consider that there might be e bug in memcmp, but now that you mention it, I suppose this all warrants more investigation.

@jasnell
Copy link
Member

jasnell commented Apr 30, 2021

Unless there's a bug in test/pummel/test-crypto-timing-safe-equal-benchmarks.js

Bug.... no.... but there could be something at play here with the use of process.hrtime(). The LinuxOne machine is very fast, and I do wonder if there's some precision that's being lost here in the timing logic in the benchmark.

@Trott
Copy link
Member Author

Trott commented Apr 30, 2021

The LinuxOne machine is very fast, and I do wonder if there's some precision that's being lost here in the timing logic in the benchmark.

I could be wrong, but I would think lost precision should make the test more robust.

@Trott
Copy link
Member Author

Trott commented Apr 30, 2021

If this really isn't timing-safe, isn't that a bug in CRYPTO_memcmp?

There's also several lines of code inside TimingSafe() before it gets to the memcmp() so maybe that's where the problem lies somehow? Maybe there are optimizations that occur when the two buffers have the same contents that result in a timing difference that is still measurable, especially since the test is testing with one-character buffers?

@Trott
Copy link
Member Author

Trott commented May 1, 2021

Here's a stress test on a branch where I removed memcmp() and always return true instead. (I had to alter a comparison in the test too.)

Stress test: https://ci.nodejs.org/job/node-stress-single-test/296/nodes=rhel7-s390x/
Branch diff vs. master: https://github.com/nodejs/node/compare/master...Trott:no-memcmp?expand=1

If this fails a lot, then the timing issue doesn't involve memcmp().

If it succeeds 100%, then that strongly suggests that the issue is with memcmp(). 😱

@Trott
Copy link
Member Author

Trott commented May 1, 2021

Here's a stress test on a branch where I removed memcmp() and always return true instead. (I had to alter a comparison in the test too.)

Stress test: https://ci.nodejs.org/job/node-stress-single-test/296/nodes=rhel7-s390x/
Branch diff vs. master: https://github.com/nodejs/node/compare/master...Trott:no-memcmp?expand=1

If this fails a lot, then the timing issue doesn't involve memcmp().

If it succeeds 100%, then that strongly suggests that the issue is with memcmp(). 😱

So far, no failures in 50 runs. So that strongly suggests (to me at least) that the problem is indeed memcmp(). Do you agree, @tniessen? What do we do next?

@Trott
Copy link
Member Author

Trott commented May 1, 2021

So far, no failures in 50 runs. So that strongly suggests (to me at least) that the problem is indeed memcmp(). Do you agree, @tniessen? What do we do next?

Looks like we're not even using a wrapper around OpenSSL's CRYPTO_memcmp(). We are calling it directly.

@Trott
Copy link
Member Author

Trott commented May 1, 2021

So far, no failures in 50 runs. So that strongly suggests (to me at least) that the problem is indeed memcmp(). Do you agree, @tniessen? What do we do next?

Looks like we're not even using a wrapper around OpenSSL's CRYPTO_memcmp(). We are calling it directly.

Maybe the way I modified the C++ function allowed the optimizer to remove some of the other code. So maybe the conclusion I draw here isn't correct. I should have thought of that before...

@not-an-aardvark
Copy link
Contributor

Re. "Is the existing implementation not timing safe enough?", I think:

  • There are a lot of layers of abstraction in the benchmark test, including crypto.timingSafeEqual. One of them is causing the test to fail.

  • At this point, my best guess is that the culprit is branch prediction within the benchmark test itself. V8 compiles the benchmark in some way involving lots of fancy inlining and complicated runtime heuristics. Maybe this results in a faster compilation of the "run a benchmark with hardcoded equal inputs" code than the "run a benchmark with hardcoded unequal inputs" code, despite the use of things like eval and randomized ordering to throw off the optimizer. If this is the case, then the test is just flaky and crypto.timingSafeEqual is fine.

    (As for why the breakage just started recently: I doubt it's substantively related to Trott@845adb7, but maybe that pushed something like memory usage over a particular threshold that caused things to be compiled/arranged differently.)

  • Another possibility, which is plausible albeit maybe far-fetched, is that crypto.timingSafeEqual or the benchmark code is quicker to return false than to return true based on some artifact of code generation, but takes the same amount of time to return false for unequal buffers regardless of how "similar" they are. In this case, there would technically be a bug but it would be unlikely to be exploitable as a security issue.

  • At this point we can't completely rule out the possibility that there's a problem with CRYPTO_memcmp. Since there's some uncertainty, I think it would be reasonable to add an HMAC as a defense-in-depth mechanism.

If we want to investigate further, there are a few routes we could try:

  • Update the benchmark to compare random buffers that are read from I/O somehow, rather than generating effectively-constant buffers in JS. This would more accurately simulate real-world conditions, in which a buffer might come in over the network, and would probably rule out the possibility of V8 shenanigans affecting the benchmark. (More specifically, a harness could create two files that either (a) both contain the same random buffer, or (b) both have different random buffers, and then run a Node program that uses crypto.timingSafeEqual to check whether the buffers are equal, and compare the runtime between these cases.) This test would take a lot longer to run, but seems like it would be fairly conclusive about whether the problem is in our benchmark or not.
    • A potentially-easier version of this would be to implement (part of) the test in native code, and have it call crypto.timingSafeEqual via N-API (or just call the native binding used by crypto.timingSafeEqual).
    • If that test identifies a timing issue, it would also be interesting to compare the timing between cases where (a) both files contain different random buffers, and (b) both files contain the same random buffer except that the last byte is different.
  • It could be worth examining the assembly generated by CRYPTO_memcmp in the compiled node binary for this platform, although this wouldn't rule out bugs caused by hardware-level optimizations.
  • In theory it would be possible to examine the code that V8 generates somehow, but I'm not aware of how to do this.

@Trott
Copy link
Member Author

Trott commented May 1, 2021

  • Update the benchmark to compare random buffers that are read from I/O somehow, rather than generating effectively-constant buffers in JS.

Perhaps using crypto.randomUUID() would be sufficient? EDIT: Er, ok, not sufficient, but maybe an incremental improvement....

Trott added a commit to Trott/io.js that referenced this pull request May 1, 2021
Avoid possible V8 optimizations that may invalidate benchmark. This is
done by writing randm UUIDs to file and reading the files again as
needed, rather than having hardcoded strings for the buffer contents.

Refs: nodejs#38488 (comment)
@Trott
Copy link
Member Author

Trott commented May 1, 2021

create two files that either (a) both contain the same random buffer, or (b) both have different random buffers, and then run a Node program that uses crypto.timingSafeEqual to check whether the buffers are equal, and compare the runtime between these cases.)

I tried to do a lightweight version of this approach in #38493

Add HMAC to crypto.timingSafeEqual(). This makes things slower but also
makes them far more timing safe.

Refs: nodejs#38226 (comment)
Fixes: nodejs#38226
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. crypto Issues and PRs related to the crypto subsystem. needs-ci PRs that need a full CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

investigate flaky test-crypto-timing-safe-equal-benchmarks
8 participants