Usage of the `PRF` (in the draft) and `hash` (in the original paper) diverges in non trivial ways #26

cipriancraciun · 2024-11-16T16:22:06Z

If one looks at the original paper, and how the hash function (PRF in our case) is used:

buf[0] = hash(cnt++, passwd, salt)
buf[m] = hash(cnt++, buf[m-1])
buf[m] = hash(cnt++, prev, buf[m])
int other = to_int(hash(cnt++, salt, idx_block)) mod s_cost (where block_t idx_block = ints_to_block(t, m, i))
buf[m] = hash(cnt++, buf[m], buf[other])

One (with the exception of the first usage, and perhaps partially the second one), could conclude that the signature of hash is similar to block_t hash (counter:usize, block_1:block_t, block_2:block_t), i.e. it always takes a counter and two blocks, and compresses them into a single block.

In fact, the paper states:

Since H maps blocks of 2k bits down to blocks of k bits, we sometimes refer to H as a cryptographic compression function.

(And personally, I find the simplicity of the initial Balloon algorithm, and the simplicity of its building blocks, namely the single one hash, quite appealing.)

However, the draft has the following usages of PRF:

key = PRF(key, password || salt || personalization || associatedData || LE32(pepper.Length) || LE32(password.Length) || LE32(salt.Length) || LE32(personalization.Length) || LE32(associatedData.Length))
previous = PRF(key, previous || UTF8("bkdf") || LE32(counter++))
pseudorandom = pseudorandom || PRF(emptyKey, LE32(VERSION) || personalization || LE32(spaceCost) || LE32(timeCost) || LE32(parallelism) || LE32(iteration) || LE64(counter++))
buffer[0] = PRF(key, LE32(VERSION) || LE32(spaceCost) || LE32(timeCost) || LE32(parallelism) || LE32(iteration) || LE64(counter++))
buffer[m] = PRF(key, buffer[m - 1] || LE64(counter++))
buffer[m] = PRF(key, previous || buffer[m] || buffer[other1] || buffer[other2] || buffer[other3] || LE64(counter++))

Namely:

it introduces the concept of key, which doesn't exist in the initial algorithm;
sometimes there is a counter, sometimes there isn't one;
sometimes the VERSION is included, sometimes not;
sometimes the order of included parameters is one way, sometimes it is not;

Doesn't it seem that the draft drifts quite a bit from the original paper?

I'm not saying it's wrong to have a different take, but I do see two problems:

the major one being that the paper gives some proofs for what was specified in the paper, meanwhile the draft makes some changes that I'm not sure they are equivalent;
the way PRF is used (with a lot of inline canonicalization) is error prone to this types of problems;
the way the inputs of the PRF are computed, might have performance impacts (as compared to the initial version, more in a different paragraph);

Many hash functions, especially the "modern" ones (I don't know about Blake, but for example Xxh3, which although is not a cryptographic hash function) might have optimized assembly implementations when they work on fixed blocks.

Thus, from this point of view, perhaps the original paper might yield a performance boost (because it uses fixed inputs) that the current draft PRF usage that has to concatenate a lot of data.

The text was updated successfully, but these errors were encountered:

samuel-lucas6 · 2024-11-16T17:37:56Z

(And personally, I find the simplicity of the initial Balloon algorithm, and the simplicity of its building blocks, namely the single one hash, quite appealing.)

There are some benefits to using an unkeyed algorithm. The reason this was changed was purely to allow HMAC for NIST compliance since that's closer to PBKDF2. By supporting a keyed algorithm, you can also support unkeyed algorithms, whereas this isn't the case the other way around. The trouble is trying to support all these different algorithms is messy because of the design differences.

sometimes there is a counter, sometimes there isn't one;

There's a counter in most places. There's no need for a counter when deriving the PRF key because it's a single output and basically HKDF-Extract. You could get rid of the counter for the final key derivation as well, but I think it's better to keep one, aligning with HKDF-Expand.

sometimes the VERSION is included, sometimes not;

Where else do you think this should be included? When deriving the PRF key or for the output?

sometimes the order of included parameters is one way, sometimes it is not;

I'm not sure what you mean by this.

the major one being that the paper gives some proofs for what was specified in the paper, meanwhile the draft makes some changes that I'm not sure they are equivalent;

The fundamental approach to memory hardness should be the same. If that's the case, you can think of the rest more as cosmetic changes that have some pros and cons.

the way PRF is used (with a lot of inline canonicalization) is error prone to this types of problems;

What type of problem? This type of canonicalization is common with KDFs/MACs. If something is fixed length, you don't need to encode the length. Encoding the length of fixed length parameters is kind of like encoding the length of the length encodings.

Thus, from this point of view, perhaps the original paper might yield a performance boost (because it uses fixed inputs) that the current draft PRF usage that has to concatenate a lot of data.

Some of the concatenation is a one off cost or can be cached. You can generally figure out how many blocks there are going to be rather than it being a variable number. The personalization is the main thing that's interfered with that, but it can be cached. I have debated whether to prehash everything but the counter and use that as the PRF key for the pseudorandom bytes derivation. I'm not convinced that's worth it though.

I would think more calls to the hash functions is slower, but I haven't implemented this version of the draft to do benchmarks.

cipriancraciun · 2024-11-16T19:29:58Z

Citing all the places where PRF is used:

key = PRF(key, password || salt || personalization || associatedData || LE32(pepper.Length) || LE32(password.Length) || LE32(salt.Length) || LE32(personalization.Length) || LE32(associatedData.Length))
## uses "parameters" (i.e. salt, persolalization, associated data, etc.)
## misses the VERSION, timeCost, parallelism, etc.

pseudorandom = pseudorandom || PRF(emptyKey, LE32(VERSION) || personalization || LE32(spaceCost) || LE32(timeCost) || LE32(parallelism) || LE32(iteration) || LE64(counter++))
## uses VERSION
## misses cost, parallelism, etc.

buffer[0] = PRF(key, LE32(VERSION) || LE32(spaceCost) || LE32(timeCost) || LE32(parallelism) || LE32(iteration) || LE64(counter++))
## uses VERSION, uses cost
## misses personalization

sometimes the VERSION is included, sometimes not;

sometimes the order of included parameters is one way, sometimes it is not;

Personally, I would just take all possible context that is either public (personalization, version, etc.) or mostly constant (costs, parallelism, etc.) and just just mix them into a single value that is to be used wherever "parameters" (or part of the parameters) might be needed.

For example, rewriting the above:

context = PRF(emptyKey,
    salt || LE32(salt.Length)      ## let's assume it's public
 || pepper || LE32(pepper.Length)  ## let's also assume it's public
 || associatedData || LE32(associatedData.Length)
 || personalization || LE32(personalization.Length)
 || LE32(VERSION)
 || LE32(parallelism)
 || LE32(timeCost)
 || LE32(spaceCost)
)
 

key = PRF(key, context || password)

pseudorandom = pseudorandom
         || PRF(emptyKey, context || LE32(iteration) || LE64(counter++))
buffer[0] = PRF(key     , context || LE32(iteration) || LE64(counter++))

See how "nice" the PRF usage looks?

there is some "key";
the first input is some "context";
then follows the data;

the way PRF is used (with a lot of inline canonicalization) is error prone to this types of problems;

What type of problem? This type of canonicalization is common with KDFs/MACs. If something is fixed length, you don't need to encode the length. Encoding the length of fixed length parameters is kind of like encoding the length of the length encodings.

Indeed canonicalization is common, however the usual flavour of canonicalization is either:

length(data_x) || data_x || ...
data_x || length(data_x) || ...
(this was even standardized by NIST as TupleHash in https://csrc.nist.gov/pubs/sp/800/185/final;)

Meanwhile the current draft puts the lengths at the end, thus it's error prone to implement correctly. (Perhaps it doesn't break the cryptography, but it does offer enough opportunities to mix things and have broken outputs that don't match the test vectors.)

I would think more calls to the hash functions is slower, but I haven't implemented this version of the draft to do benchmarks.

Looking in the Blake3 paper there is the following graph:

As you can see, for small hashes (although here is the full hash computation, but perhaps it also extends to discrete individual update(some_data) calls), there is a smal cost for 64 bytes, then it goes up in between 64-128, then goes down for 128, up again until 256 bytes, and then there is a plateou in between 256 and 2K (perhaps some CPU pipelining or branch-prediction kicks-in).

Thus, assuming one uses Blake3 or another algorithm that has a similar behaviour (strangely enough also SHA2 follows a similar pattern), if the PRF is used as I've proposed above (plus the other two usages), i.e.:

PRF(key, context || LE32(iteration) || LE64(counter++))
## let's assume we pad iteration and counter to fit in 16 bytes each
=> PRF(key, 32 bytes || 32 bytes)
=> PRF(key, 64 bytes)

buffer[m] = PRF(key, buffer[m - 1] || LE64(counter++))
## let's assume we pad counter to 16 bytes, and we imagine we have an iteration of 0
=> PRF(key, 32 bytes || 32 bytes)
=> PRF(key, 64 bytes)

buffer[m] = PRF(key, previous || buffer[m] || buffer[other1] || buffer[other2] || buffer[other3] || LE64(counter++))
## let's also pad counter to 16 bytes, plus iteration of 0
=> PRF(key, 32 bytes || x6)
=> PRF(key, 192 bytes)

We get almost the optimal behaviour for Blake3.

samuel-lucas6 · 2024-11-17T12:50:39Z

misses the VERSION, timeCost, parallelism, etc.

The parallelism loop iteration can't be included in the key derivation since the key is static.

misses cost, parallelism, etc.

Those are included when computing the pseudorandom bytes.

misses personalization

This is unnecessary if it's in the key.

Personally, I would just take all possible context that is either public (personalization, version, etc.) or mostly constant (costs, parallelism, etc.) and just just mix them into a single value that is to be used wherever "parameters" (or part of the parameters) might be needed.

It's unfortunately not that simple because you don't want things like the salt/pepper/associated data in the pseudorandom bytes derivation. There's also no need to process those multiple times.

You could do it for the VERSION, personalization, parallelism, timeCost, spaceCost, and parallelism iteration, but it's pretty ugly because you need to use an empty key. You also end up potentially hashing more data because a hash is larger than those encoded parameters.

But what you're talking about is what I was thinking about with prehashing the personalization so the pseudorandom bytes input fits into a block (that isn't guaranteed if the personalization length is variable).

Meanwhile the current draft puts the lengths at the end, thus it's error prone to implement correctly. (Perhaps it doesn't break the cryptography, but it does offer enough opportunities to mix things and have broken outputs that don't match the test vectors.)

It's also normal to put the lengths at the end. For example, see the ChaCha20-Poly1305 RFC. It allows you to process inputs without knowing their length in advance, which admittedly isn't relevant here. If this is done incorrectly, the test vectors shouldn't pass.

Looking in the Blake3 paper there is the following graph

Yes, I've seen that graph. The question is how do repeated small calls do vs one slightly longer call.

I definitely don't want to include padding because it depends on the algorithm and that gets messy.

cipriancraciun mentioned this issue Nov 16, 2024

Mismatch between how the draft handles dependent blocks, and how the paper describes it #25

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Usage of the `PRF` (in the draft) and `hash` (in the original paper) diverges in non trivial ways #26

Usage of the `PRF` (in the draft) and `hash` (in the original paper) diverges in non trivial ways #26

cipriancraciun commented Nov 16, 2024

samuel-lucas6 commented Nov 16, 2024

cipriancraciun commented Nov 16, 2024

samuel-lucas6 commented Nov 17, 2024

Usage of the PRF (in the draft) and hash (in the original paper) diverges in non trivial ways #26

Usage of the PRF (in the draft) and hash (in the original paper) diverges in non trivial ways #26

Comments

cipriancraciun commented Nov 16, 2024

samuel-lucas6 commented Nov 16, 2024

cipriancraciun commented Nov 16, 2024

samuel-lucas6 commented Nov 17, 2024

Usage of the `PRF` (in the draft) and `hash` (in the original paper) diverges in non trivial ways #26

Usage of the `PRF` (in the draft) and `hash` (in the original paper) diverges in non trivial ways #26