Discussion of Scale-out Architecture for High-volume Repositories TAP (TAP 21) #190

ergonlogic · 2024-11-25T18:59:25Z

TAP-21 originated with theupdateframework/specification/issues/309. Following discussion at the TUF community meeting on 2024-11-01, we've drafted TAP-21 and submitted a PR (#189).

In addition to broader discussions of TAP-21, we are specifically seeking feedback to:

Validate our motivation and rationale
Validate the calculations we undertook (See: TAP-21 metadata overhead calculator)
Better understand how to approach the following sections (which are sparse, atm):
- Specification
- Backwards Compatibility
- Augmented Reference Implementation

We have thoughts on the sparse sections, but have not yet been able to articulate them clearly enough to include in the TAP.

jku · 2024-11-26T13:17:30Z

You mention TOFU but it's a little unclear how the client reacts when top level repository changes the "initial sub-repo metadata".

Assume client has cached earlier sub-repo metadata and the new sub-repo metadata is not compatible with the sub-repo metadata in client cache (as it can't be guaranteed to be). How does client react?

ergonlogic · 2024-11-26T16:56:52Z

You mention TOFU but it's a little unclear how the client reacts when top level repository changes the "initial sub-repo metadata".
Assume client has cached earlier sub-repo metadata and the new sub-repo metadata is not compatible with the sub-repo metadata in client cache (as it can't be guaranteed to be). How does client react?

By "initial sub-repo metadata", we mean only 1.root.json for each sub-repo. For each relevant subrepo, the client must then follow the usual TUF procedure of trying to download and validate 2.root.json for the sub-repo, etc.

Under normal circumstances, key rotation for a sub-repo would involve deploying a new n.root.json to the sub-repo. If any non-root keys were rotated, we'd also need to re-sign the corresponding metadata, etc. No changes to the top-level metadata would occur in this process.

So, the scenario you describe should not occur under normal operations. If it were to occur, presumably the top-level repo has to be considered canonical, since it is the root of trust (ie. the top-level repo's 1.root.json ships with the client).

The client behaviour should presumably be to:

Update top-level TUF repo metadata
For each sub-repo, verify that its cached sub-repo initial root metadata remains valid
1. If valid, proceed as normal (update sub-repo metadata, etc.)
2. If not valid, remove all of the sub-repo's TUF metadata, including initial root metadata, download the new initial root metadata for the sub-repo, validate it (against top-level TUF metadata), then proceed with normal sub-repo operations.

jku · 2024-11-27T08:44:30Z

For each sub-repo, verify that its cached sub-repo initial root metadata remains valid

This is currently not part of the spec but I believe it would be a good addition: verify-from-bootstrap feature in python-tuf

So, the scenario you describe should not occur under normal operations.

If this scenario is not expected to happen... then I'm thinking I maybe have not understood who the signers/keyowners of the sub-repo are. Can you talk more about that?

The reason I'm thinking I don't quite understand the setup is this:

if the signers are project maintainers (or project-specific release automation), then signing keys will be lost and some maintainers will turn out to be malicious: In a high-volume repository the top-level will need to intervene somewhat regularly
if the signers are repository automation (in other words the keys are KMS keys controlled by repository) I can see how content in top-level repo does not need to change ... but then I don't see the why sub-repos exist at all -- a single signing key (delivered as an artifact in top-level repo) would work just as well.

ergonlogic · 2024-11-27T17:24:19Z

if the signers are project maintainers (or project-specific release automation), then signing keys will be lost and some maintainers will turn out to be malicious: In a high-volume repository the top-level will need to intervene somewhat regularly

I believe the client behaviour described above ought to handle this scenario reasonably well. But please correct me if you see a flaw in that process. We're working on incorporating it into the Specification section of the TAP.

if the signers are repository automation (in other words the keys are KMS keys controlled by repository) I can see how content in top-level repo does not need to change ... but then I don't see the why sub-repos exist at all -- a single signing key (delivered as an artifact in top-level repo) would work just as well.

First off, we are primarily targeting this scenario ("signers are repository automation"). I'll update the TAP to reflect this.

That said, we considered re-using the same root metadata across all the sub-repos. However, when an online key (targets, snapshot or timestamp) is rotated, all of the metadata signed by that key needs to be re-signed. For a high-volume repository this can take a (relatively) long time. Separate root metadata per sub-repo allows for an incremental roll-out of new online keys.

Note that this still allows implementers to re-use keys across sub-repos. In fact, I think this would be the recommended approach.

jku · 2024-11-28T11:56:42Z

I meant that I don't understand what the advantage of TAP-21 in general is for the "signers are repository automation" case.

Compare to a setup where there are no sub-repos and the top-level TUF repo only contains a set of public keys as artifacts (no project indexes or packages are added to TUF): These keys are used to sign the project indexes and/or actual packages outside of tuf by the repository automation, and clients get these keys using TUF and use them to verify indexes/packages. So TUF is only used as a repository signing key rotation mechanism, not as a delegation mechanism. In this setup:

TUF repository is very simple to maintain and very slow moving (repository only changes if signing key changes)
client bandwidth use and number of requests is minimal
upload latency is minimal
the security posture seems roughly similar to me when compared to TAP-21
- neither provides global snapshot
- repository online key compromise is still "total" (all packages can be signed by attacker) but recoverable
- the one clear advantage TAP-21 seems to have is builtin timeliness check from timestamp role: this can be mitigated with making the signatures expire with e.g. signing certs

It's entirely possible I've missed something: Apart from the theoretical ability to switch to "project signing" at a later date, do you believe TAP-21 has real security advantages over the simple setup described above?

ergonlogic · 2024-11-28T20:12:04Z

Compare to a setup where there are no sub-repos and the top-level TUF repo only contains a set of public keys as artifacts (no project indexes or packages are added to TUF): These keys are used to sign the project indexes and/or actual packages outside of tuf by the repository automation, and clients get these keys using TUF and use them to verify indexes/packages. So TUF is only used as a repository signing key rotation mechanism, not as a delegation mechanism.

How do clients get the index & package signatures in this scenario? That's the payload of the sub-repos.

jku · 2024-11-29T08:34:48Z

Compare to a setup where there are no sub-repos and the top-level TUF repo only contains a set of public keys as artifacts (no project indexes or packages are added to TUF): These keys are used to sign the project indexes and/or actual packages outside of tuf by the repository automation, and clients get these keys using TUF and use them to verify indexes/packages. So TUF is only used as a repository signing key rotation mechanism, not as a delegation mechanism.

How do clients get the index & package signatures in this scenario? That's the payload of the sub-repos.

I suppose there's many ways to handle that but for discussion let's say well-known URLs based on the artifact URL: if client wants to download "$PACKAGE_URL", signature is at "${PACKAGE_URL}.signature"

ergonlogic · 2024-11-29T18:59:31Z

the one clear advantage TAP-21 seems to have is builtin timeliness check from timestamp role [...]

This is an advantage of TUF, not TAP-21 specifically. TUF also affords protection against a variety of attacks that the "simple setup" appears not to provide. For example, TUF mitigates against endless data attacks by including the size of downloaded files within TUF metadata.

[lack of timeliness checks] can be mitigated with making the signatures expire with e.g. signing certs

I don't doubt that there are alternative mechanisms to provide the same protections as TUF. But each would presumably complicate the alternative solution. This, in turn, undermines the argument for its simplicity. Why re-invent the wheel?

do you believe TAP-21 has real security advantages over the simple setup described above?

TAP-21 preserves all but a handful of the protections afforded by TUF. So, yes.

trishankatdatadog · 2024-12-01T23:09:35Z

Left some comments in your TAP PR, @ergonlogic. I think your real problem there is different than the one you have proposed a solution for. As I suggested there, do you think getting on a call to discuss your problem in its entire context would help serve you better?

jku · 2024-12-02T09:00:24Z

For example, TUF mitigates against endless data attacks by including the size of downloaded files within TUF metadata.

Sure but this pretty much a form of DOS which the attacker could do anyway if they control the repository or the mirror. I don't think this makes a meaningful difference between the two models.

I don't doubt that there are alternative mechanisms to provide the same protections as TUF. But each would presumably complicate the alternative solution. This, in turn, undermines the argument for its simplicity. Why re-invent the wheel?

I think you may be underestimating the complexity of running the sub-repositories. The two models being compared have, in my opinion, vastly different levels of simplicity -- and this overall difference would not be changed by some small changes in either one

jku · 2024-12-02T09:07:16Z

TAP-21 preserves all but a handful of the protections afforded by TUF.

The TUF "security model" refers to an idealized repository that we've found out does not usually exist. I'd really like to see specific relevant attacks that are being protected from: DOS protection is nice but not that persuasive...

(I'm not trying be difficult by the way: There may well be attacks that this protects from, I just haven't been able to find any that a simpler solution wouldn't handle)

ergonlogic · 2024-12-02T17:01:07Z

Left some comments in your TAP PR, @ergonlogic. I think your real problem there is different than the one you have proposed a solution for.

Thank you, I've responded in the PR.

In the #tuf Slack channel, @mnm678 reviewed our metadata download calculations, and surmised that previous calculations (for PEP-458) underestimated the size of snapshot metadata.

The problems we describe in the TAP are not purely theoretical. We are observing them IRL. That said, maybe we've misinterpreted the root cause. Could you elaborate?

As I suggested there, do you think getting on a call to discuss your problem in its entire context would help serve you better?

I'm happy to discuss further on a call. FWIW, we did discuss this on the last community call. Is there another one coming up this week? If not, I'm happy to chat in another venue.

ergonlogic · 2024-12-02T19:24:12Z

I think you may be underestimating the complexity of running the sub-repositories.

Perhaps. However, based on my experience implementing Rugged, I don't think it would add too much complexity. Each sub-repo is just a simple TUF repo, after all.

The two models being compared have, in my opinion, vastly different levels of simplicity --

Agreed. TAP-21 is implementing TUF for sub-repos, whereas the alternative is essentially just publishing some signatures.

and this overall difference would not be changed by some small changes in either one

I disagree. I think you're comparing apples and oranges. For an alternative to actually provide security comparable to TUF, it's reasonable to think that it would become significantly more complex than just publishing signatures.

ergonlogic · 2024-12-02T20:02:25Z

I'd really like to see specific relevant attacks that are being protected from [...] that a simpler solution wouldn't handle

Within a sub-repo, all of the protections afforded by TUF are present.

As far as I can tell, the "simple" alternative does not appear to protect against rollback or indefinite freeze attacks. For example, an attacker could present an older version of an index, along with its previously published signature. Since the signature was generated by a valid key and matched the older index, a client would presumably validate it.

I'm not trying be difficult by the way

I'm sorry if I am being defensive. I guess I just don't really understand how showing that TUF (not TAP-21, but TUF itself) is more secure than a theoretical rudimentary alternative is relevant.

Registries are free to adopt TUF or not. Is it overkill for sub-repos? Perhaps. But I'd prefer to err on the side of caution. TUF itself, along with PHP-TUF, Rugged and other implementations, have all passed security audits. It doesn't seem to me that starting from scratch is a worthwhile endeavour.

trishankatdatadog · 2024-12-04T04:39:01Z

I'm happy to discuss further on a call. FWIW, we did discuss this on the last community call. Is there another one coming up this week? If not, I'm happy to chat in another venue.

No need to discuss again on the community call, but let me reach out to you over DM on the CNCF Slack to chat separately. Thanks!

ergonlogic mentioned this issue Nov 28, 2024

Request for comment: Scalable architecture theupdateframework/specification#309

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion of Scale-out Architecture for High-volume Repositories TAP (TAP 21) #190

Discussion of Scale-out Architecture for High-volume Repositories TAP (TAP 21) #190

ergonlogic commented Nov 25, 2024

jku commented Nov 26, 2024 •

edited

Loading

ergonlogic commented Nov 26, 2024 •

edited

Loading

jku commented Nov 27, 2024 •

edited

Loading

ergonlogic commented Nov 27, 2024

jku commented Nov 28, 2024 •

edited

Loading

ergonlogic commented Nov 28, 2024

jku commented Nov 29, 2024

ergonlogic commented Nov 29, 2024

trishankatdatadog commented Dec 1, 2024

jku commented Dec 2, 2024

jku commented Dec 2, 2024

ergonlogic commented Dec 2, 2024

ergonlogic commented Dec 2, 2024

ergonlogic commented Dec 2, 2024

trishankatdatadog commented Dec 4, 2024

Discussion of Scale-out Architecture for High-volume Repositories TAP (TAP 21) #190

Discussion of Scale-out Architecture for High-volume Repositories TAP (TAP 21) #190

Comments

ergonlogic commented Nov 25, 2024

jku commented Nov 26, 2024 • edited Loading

ergonlogic commented Nov 26, 2024 • edited Loading

jku commented Nov 27, 2024 • edited Loading

ergonlogic commented Nov 27, 2024

jku commented Nov 28, 2024 • edited Loading

ergonlogic commented Nov 28, 2024

jku commented Nov 29, 2024

ergonlogic commented Nov 29, 2024

trishankatdatadog commented Dec 1, 2024

jku commented Dec 2, 2024

jku commented Dec 2, 2024

ergonlogic commented Dec 2, 2024

ergonlogic commented Dec 2, 2024

ergonlogic commented Dec 2, 2024

trishankatdatadog commented Dec 4, 2024

jku commented Nov 26, 2024 •

edited

Loading

ergonlogic commented Nov 26, 2024 •

edited

Loading

jku commented Nov 27, 2024 •

edited

Loading

jku commented Nov 28, 2024 •

edited

Loading