-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Usage of the PRF
(in the draft) and hash
(in the original paper) diverges in non trivial ways
#26
Comments
There are some benefits to using an unkeyed algorithm. The reason this was changed was purely to allow HMAC for NIST compliance since that's closer to PBKDF2. By supporting a keyed algorithm, you can also support unkeyed algorithms, whereas this isn't the case the other way around. The trouble is trying to support all these different algorithms is messy because of the design differences.
There's a counter in most places. There's no need for a counter when deriving the PRF key because it's a single output and basically HKDF-Extract. You could get rid of the counter for the final key derivation as well, but I think it's better to keep one, aligning with HKDF-Expand.
Where else do you think this should be included? When deriving the PRF key or for the output?
I'm not sure what you mean by this.
The fundamental approach to memory hardness should be the same. If that's the case, you can think of the rest more as cosmetic changes that have some pros and cons.
What type of problem? This type of canonicalization is common with KDFs/MACs. If something is fixed length, you don't need to encode the length. Encoding the length of fixed length parameters is kind of like encoding the length of the length encodings.
Some of the concatenation is a one off cost or can be cached. You can generally figure out how many blocks there are going to be rather than it being a variable number. The I would think more calls to the hash functions is slower, but I haven't implemented this version of the draft to do benchmarks. |
Citing all the places where
Personally, I would just take all possible context that is either public (personalization, version, etc.) or mostly constant (costs, parallelism, etc.) and just just mix them into a single value that is to be used wherever "parameters" (or part of the parameters) might be needed. For example, rewriting the above:
See how "nice" the PRF usage looks?
Indeed canonicalization is common, however the usual flavour of canonicalization is either:
Meanwhile the current draft puts the lengths at the end, thus it's error prone to implement correctly. (Perhaps it doesn't break the cryptography, but it does offer enough opportunities to mix things and have broken outputs that don't match the test vectors.)
Looking in the Blake3 paper there is the following graph: As you can see, for small hashes (although here is the full hash computation, but perhaps it also extends to discrete individual Thus, assuming one uses Blake3 or another algorithm that has a similar behaviour (strangely enough also SHA2 follows a similar pattern), if the PRF is used as I've proposed above (plus the other two usages), i.e.:
We get almost the optimal behaviour for Blake3. |
The parallelism loop iteration can't be included in the key derivation since the key is static.
Those are included when computing the pseudorandom bytes.
This is unnecessary if it's in the key.
It's unfortunately not that simple because you don't want things like the salt/pepper/associated data in the pseudorandom bytes derivation. There's also no need to process those multiple times. You could do it for the VERSION, personalization, parallelism, timeCost, spaceCost, and parallelism iteration, but it's pretty ugly because you need to use an empty key. You also end up potentially hashing more data because a hash is larger than those encoded parameters. But what you're talking about is what I was thinking about with prehashing the personalization so the pseudorandom bytes input fits into a block (that isn't guaranteed if the personalization length is variable).
It's also normal to put the lengths at the end. For example, see the ChaCha20-Poly1305 RFC. It allows you to process inputs without knowing their length in advance, which admittedly isn't relevant here. If this is done incorrectly, the test vectors shouldn't pass.
Yes, I've seen that graph. The question is how do repeated small calls do vs one slightly longer call. I definitely don't want to include padding because it depends on the algorithm and that gets messy. |
If one looks at the original paper, and how the
hash
function (PRF
in our case) is used:buf[0] = hash(cnt++, passwd, salt)
buf[m] = hash(cnt++, buf[m-1])
buf[m] = hash(cnt++, prev, buf[m])
int other = to_int(hash(cnt++, salt, idx_block)) mod s_cost
(whereblock_t idx_block = ints_to_block(t, m, i)
)buf[m] = hash(cnt++, buf[m], buf[other])
One (with the exception of the first usage, and perhaps partially the second one), could conclude that the signature of
hash
is similar toblock_t hash (counter:usize, block_1:block_t, block_2:block_t)
, i.e. it always takes a counter and two blocks, and compresses them into a single block.In fact, the paper states:
(And personally, I find the simplicity of the initial Balloon algorithm, and the simplicity of its building blocks, namely the single one
hash
, quite appealing.)However, the draft has the following usages of
PRF
:key = PRF(key, password || salt || personalization || associatedData || LE32(pepper.Length) || LE32(password.Length) || LE32(salt.Length) || LE32(personalization.Length) || LE32(associatedData.Length))
previous = PRF(key, previous || UTF8("bkdf") || LE32(counter++))
pseudorandom = pseudorandom || PRF(emptyKey, LE32(VERSION) || personalization || LE32(spaceCost) || LE32(timeCost) || LE32(parallelism) || LE32(iteration) || LE64(counter++))
buffer[0] = PRF(key, LE32(VERSION) || LE32(spaceCost) || LE32(timeCost) || LE32(parallelism) || LE32(iteration) || LE64(counter++))
buffer[m] = PRF(key, buffer[m - 1] || LE64(counter++))
buffer[m] = PRF(key, previous || buffer[m] || buffer[other1] || buffer[other2] || buffer[other3] || LE64(counter++))
Namely:
key
, which doesn't exist in the initial algorithm;VERSION
is included, sometimes not;Doesn't it seem that the draft drifts quite a bit from the original paper?
I'm not saying it's wrong to have a different take, but I do see two problems:
PRF
is used (with a lot of inline canonicalization) is error prone to this types of problems;PRF
are computed, might have performance impacts (as compared to the initial version, more in a different paragraph);Many hash functions, especially the "modern" ones (I don't know about Blake, but for example Xxh3, which although is not a cryptographic hash function) might have optimized assembly implementations when they work on fixed blocks.
Thus, from this point of view, perhaps the original paper might yield a performance boost (because it uses fixed inputs) that the current draft
PRF
usage that has to concatenate a lot of data.The text was updated successfully, but these errors were encountered: