Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New RFC: signing registry index commits #2474

Closed
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
358 changes: 358 additions & 0 deletions text/0000-signing-registry-commits.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,358 @@
- Feature Name: verified_registry_commits
- Start Date: 2018-05-08
- RFC PR: (leave this empty)
- Rust Issue: (leave this empty)

# Summary
[summary]: #summary

Enable commits to crate registry indexes to be signed & verified, providing
stronger guarantees as to the authenticity of their content and (transitively)
guaranteeing the authenticity of the content of the packages in the registry.

# Motivation
[motivation]: #motivation

Crate registries like the crates.io registry are a way of distributing source
code which is ultimately built and executed on end user machines. It is vital
that users receive valid source files for the packages they download,
especially because a malicious package could be a major attack vector to harm
users of Rust, or users of those users' projects.

The index of a registry is a git repository downloaded by cargo over HTTPS or
SSH. This index contains information about each package, including a SHA-256
checksum of the content of that package. When cargo downloads a package from
the registry, its contents are verified against that checksum. This is intended
to guarantee that the content the user downloads is the same data this referred
to by the registry index.

However, a malicious or otherwise ill-behaved third party could conceivably
(such as with a MITM attack) intercept the download of the index and modify the
checksum of a package, allowing them later to modify the content of that
package when the user requests it. They could also, conceivably, modify the
index repository at its central storage, making similar malicious edits which
will be accepted by every user.

If the content of the registry index could be authenticated in a stronger way,
this would make it more difficult for an attacker to modify index data. Because
the registry is a git repository, and a git repository is a kind of [merkle
tree][merkle], signing a commit verifies the content of all the data that
commit contains (modulo the security properties of the repository's hash
function - a discussion of SHA-1 is later in this RFC). Because a hash of each
package contents is contained in the index repository, the index as a whole can
be thought of as a merkle tree, some of the leaves of which are all of the
packages in the registry.

Signing commits is an effective and cheap way to provide stronger authenticity
of package contents. It enables cheap key rotation, because a new signature on
the head of the index repository validates all of the content in the registry.
Because git already supports commit signing, it is a natural extension of our
existing practices, rather than a large scale re-engineering of the registry
system.

# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

For normal users, this change should be completely transparent. Whenever the
index is updated if the registry in question is a registry that ought to be
signed (a 'signed registry'), the signature of HEAD is verified against the
public keys which are permitted to sign commits to the index.

## The keys TOML
Copy link

@tarcieri tarcieri Jul 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section (and by extension the key rotation section) is the only part of this proposal that's incompatible with TUF, because TUF defines a metadata format for the exact same thing.

TUF typically uses JSON (and Canonical JSON) for metadata, however I don't see any reason why it couldn't use TOML instead.

I think you could have a "minimum viable TUF" implementation that just uses a few more files and a slightly different structure for the data which otherwise would work the exact same way, and wouldn't (at least initially) be dependent on anything more heavy handed than just parsing the TOML files and verifying signatures the exact way you otherwise would in this proposal.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be trivial to convert the TOML to canonical JSON for signing.


Inside the `.cargo` directory, a TOML file exists for each signed registry
containing the public key data for all signers. Public key data is an object
containing these members:

- `key`: An ASCII armored OpenPGP public key (conforming to RFC 4880)
- `info` (optional): Information about this key, such as the user it is
supposed to belong to, which is not cryptographically verified.
- `can-commit` (optional): Whether or not this key can be used to create
commits to the registry. Boolean; Defaults to false.
- `can-rotate` (optional): Whether or not this key can be used to perform a key
rotation on this registry. Boolean; Defaults to false.

This file is maintained through a key rotation mechanism described later in the
RFC. It is plain text, and users may freely review or edit it for each registry
they depend on (though if they remove keys that are not considered invalid by
the registry administrators, they may break their dependency on that registry).

## When is a registry a signed registry?

Registries are considered signed registries if either of these hold true:

1. Current `HEAD` is a signed commit (by any key).
2. A keys file exists for that registry.

An attempt to update the `HEAD` of a signed registry to a commit that is not
signed by one of the existing committer keys is a hard failure that will
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modulo key rotation I presume?

(May need to be clearer about "existing" here)

prevent cargo from resolving dependencies in that registry any longer. Until
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving malicious data in a git repository is dangerous because it might accidentally be read (consider all the apt-get problems with it skipping verification in some cases). Is there a way of preventing unverified contents from hitting the filesystem?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, are we relying on git for downgrade protection? i.e., does git prevent you from updating a commit to a non-child commit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way of preventing unverified contents from hitting the filesystem?

I don't know if libgit2 exposes a way to fetch into memory without saving to the file system. In general, as long as cargo traverses the database from HEAD (after verifying the signature of HEAD for each cargo command), I don't think there's a lot of risk here.

It might be nice to reverse the fetch after verification fails, but we want to keep the repo in a failed state somehow to avoid this being a way of DoSing the index.

does git prevent you from updating a commit to a non-child commit?

We're just pulling, so we do a fetch followed by a merge. We can make sure that the merge is a fast forward. I'm not sure what you mean by downgrade protection; I think this suggests you're thinking we'd want invalid updates to be rejected and for the index to then continue working as normal; this is actually undesirable in my opinion: we want it to fail loudly, so that for example this can't be used as an avenue for keeping a user from receiving valid updates to the index (by always serving them invalid updates, which they experience as having no update).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if libgit2 exposes a way to fetch into memory without saving to the file system.

I don't mean avoiding using the .git directory, just that we shouldn't check out the new branch before we verify its signature - there were too many git vulnerabilities that relied on checking out files with carefully-crafted names/contents.

Downgrade attacks

For kinds of attacks on package manager, see the classic paper at https://isis.poly.edu/~jcappos/papers/cappos_mirror_ccs_08.pdf

Apparently that paper calls a downgrade a "replay attack". By that, I mean sending a version of the crates.io metadata that is older than the version the client has, hoping that the client would go back to an older version. I think git fetch + merge will not accept downgrades, but it needs to be checked and written down.

we want it to fail loudly, so that for example this can't be used as an avenue for keeping a user from receiving valid updates to the index

An attacker that can defeat the transport encryption can always conduct a freeze attack and pretend that no update had happened yet (up to any expiration dates we might have). I don't see an advantage in attacking using invalid data.

However, still, if an invalid update is detected, cargo should report an error rather than silently not performing an update just to keep us sane.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mean avoiding using the .git directory, just that we shouldn't check out the new branch before we verify its signature

Definitely! We should make sure cargo never checks out FETCH_HEAD.

An attacker that can defeat the transport encryption can always conduct a freeze attack and pretend that no update had happened yet (up to any expiration dates we might have).

You're right, and I had forgotten I had removed an expiration mechanism from this RFC.

Copy link
Member

@est31 est31 Jul 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cargo doesn't check out the index any more: rust-lang/cargo#4060. If you have a checkout at ~/.cargo/registry/index/github.com-1ecc6299db9ec823/ then it's because an older version of cargo which still did a check out.

Edit: it's a bit more complicated sadly. rust-lang/cargo#4026 made cargo not check out the index, but that broke older cargo so rust-lang/cargo#4060 made it check out the index again. If we want to make cargo not check out the index, we'd have to drop backwards compatibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, we don't do real checkouts, but its about managing what reference we use for our local HEAD. Right now we just get refs/remotes/origin/master, we'll need to track a local master branch as well as the origin/master with this RFC (without doing a full checkout).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arielb1 looking at the code, we use a refspec "refs/heads/master:refs/remotes/origin/master"; because this doesn't have a +, we only perform a fast-forward fetch

the state is fixed, cargo will not generate lockfiles for crates that depend on
packages in that registry. This includes a commit which is not signed at all.

If the HEAD of a registry is signed but that registry has no signing keys TOML
file, that registry will be considered a signed registry, but in a broken state,
because HEAD is signed but there are no trusted signing keys. In general, this
definition of signed registry is supposed to "fail closed."

## Why PGP formatted keys?

This RFC specifies that keys and signatures are exchanged using PGP format,
even though it also adds that cargo will not ship a full PGP implementation to
verify signatures. The PGP format is a rather complex binary format which
supports many options that are not relevant for our use case: as a result,
we've specified that we only support a subset of the PGP format. One could
fairly ask why we use PGP at all instead of a more straightforward solution.

The primary reason to use the PGP format is to integrate with existing git &
gpg tooling. The subset of PGP we support is compatible with keys generated by
gpg2 and with signatures on commits made with `git commit -S`. This allows
users to manually produce and verify signatures as necessary either for
administrative purposes or to check the correctness of cargo's behavior.
Additionally, GitHub has a mechanism to host GPG keys associated with a GitHub
account, which would not be possible for a custom key format.

## Advisory for people running registries

cargo will only validate the signature of the `HEAD` commit, no intermediate
commits. All committers to the registry **MUST** verify the signature of
parents to their commits, or the security properties of this system will be
violated.

Registries are free to make their own policies regarding the distribution of
keys and when to perform rotations, but if a registry operator commits
without verifying the signature of that commit's parents, they have
nullified the benefits of running a signed registry over an unsigned
registry.

# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

## Signature schemes & data formats supported by cargo

Though keys are stored in the OpenPGP format, cargo will not actually verify
signatures using a complete OpenPGP implementation such as gpg, which would be
a significant new C dependency to ship to all users. Instead, cargo will use
pure Rust implementations of established signature schemes, as well as a pure
Rust parser for a subset of the OpenPGP format.

All signatures are valid OpenPGP signatures, which means they are signatures of
hashes of the data being signed as well as metadata in accordance with the
OpenPGP spec ([IETF RFC 4880][rfc4880]).

### OpenPGP subset

All signatures and public keys will be distributed in a subset of the data
format described in RFC 4880 which is consistent with the behavior of recent
versions of gpg2 (allowing the manual creation of signatures and keys as
necessary).

These are the requirements for a signature or public key to be supported by
cargo:

- It uses what RFC 4880 calls the "old format packet header" for all packets
- All packets have a two octet length (so the packet header is 3 octets long).
- Signatures and public keys are both version 4 of their format.
- It uses one of cargo's supported signature and hash algorithms.
- The first hashed subpacket of any signature is the public key fingerprint
with subpacket type 33 (note this is the 20 byte fingerprint, not the 8 byte
key ID).

This conforms to the default behavior of gpg2 and is what is accepted and
generated by [a Rust implementation of a subset of OpenPGP written by the
RFC author][pbp].

### Signature and hash algorithms

Our initial implementation will support only one signature algorithm and one
hash algorithm. This may be extended in time. The signature algorithm is EdDSA
with curve 25519 (ed25519) and the hash algorithm is SHA-256.

## Signature distribution format

Signatures are distributed in the same manner that git distributes commit
signatures - as an additional header on the commit, verifying the commits
contents, commit message, and other headers. This way, users can verify
index commits using their own version of git, and administrations making manual
edits to the registry can use git to generate the signatures.

## Key rotation

Key rotation is performed by adding an annotated tag to the index repository,
called a key rotation tag. A key rotation tag must be signed by a key with the
`can-rotate` privilege.

A key rotation tag should point to the HEAD commit of the registry index at the
time it is made. The commit it points to should be signed using a key which is
in the post-rotation key set.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we implement some sort of auditing scheme for the can-commit keys using a Cloud HSM, that points out strongly to having the key rotation commit signature be in the pre-rotation key set.

Otherwise, an attacker that had compromised a (quorum of, if we ever use threshold signatures) rotation keys can bypass the audit scheme by creating a rotation commit signed with a key they own.

The problem with using the pre-rotation key set is that the Cloud HSM is a "single point of failure" - losing it will force a "hard update" outside of the secure updater, which might be too bad of an operational cost. I am not familiar enough with Cloud HSMs to be able to judge the risk of that happening.

I am not sure what is the best way to handle this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm reading this wrong, it really looks like it should say that the commit is signed by a key in the pre-rotation set. The post-rotation set would mean anyone could add any key so long as the commit is signed by that key.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tarcieri

There are 2 signature checks on a key rotation commit:

  1. The key rotation commit must correspond to a tag, signed by a key rotation key.
  2. The key rotation commit is itself a commit, and therefore it must be signed by a commit key.

If the commit signature can be done by a "rotated" key, then the commit signature is worthless, but the tag signature still exists (i.e., anyone with a rotation key can rotate keys, but they don't need a commit key).


The name of a key rotation tag must be `rotation-$FORMAT-$N` where $N is an
integer and $FORMAT is the format used for this rotation; the only format
specified by this RFC is `v1`. We recommend that registries use a counter
starting at `0` for `$N`. The number is only significant to distinguish
rotations from each other, and has no semantic role.

The message body of a key rotation tag in the `v1` format is a TOML document,
having the same structure as the keys toml which contains the authorized
signing keys described previously in the RFC.

When cargo updates the index, it will iterate through the new tags matching
this format that have been added to the repository in their order in the commit
history. cargo will verify that the tag is signed by a key with the
`can-rotate` privilege and then update the trusted key set by completely
replacing it with the signing keys.

## crates.io initial policy

crates.io keys will always contain a GitHub user account name in their `info`
field (possibly along with other info), and will publish those keys to that
GitHub account. The current set of crates.io keys will be published on
rust-lang.org as well. Users can use these other forms of publication to verify
that the keyset they have is valid.

A single key, belonging to the bors account, will be stored in the crates.io
service. This key will only have the commit privelege, not the rotate
privilege. Other keys, with various priveleges, will be stored in an offline
format, and belong to individual members of the Rust core team. These keys will
be used for administrative purposes.

Keys, especially the online bors key, may be rotated at irregular intervals,
not necessarily because of a known compromise. An explanation of the rotation
will always be published at rust-lang.org. We do not commit to any particular
rotation schedule.

## Security considerations

### Trust on first use

When a user first downloads a registry index, or transitions an index from
unsigned to signed, they have no pre-existing trusted keys for that registry.
For this reason, the first access of a signed registry is a leap of faith in
which the user also obtains the keys to trust for future updates. A successful
attack at that point would leave the user will with an invalid index.

We can harden crates.io against attacks at this point by distributing the
current trusted keys with the rustup distribution, allowing any security
mechanisms we put in place for the distribution of Rust and cargo binaries to
also secure this operation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something I’ve seen done in linux package managers is a simple prompt that asks the user about whether they want to add some key to their key store.

This would require users of cargo on e.g. CI (where almost every use is the first use) to appropriately acknowledge that they are not getting any security or setup the CI so that the keys are cached/signed correctly between runs.


### An attacker with no keys

An attacker with no keys cannot sign commits or rotation tags. Because of this,
an attacker with no keys would not be able to modify the registry index, even
if they were able to defeat the existing security measures like our use of
HTTPS or SSH to transmit the index data over the network.

However, such an attacker could still prevent the user from receiving updates
to the index by - for example - MITMing their connection to the git server to
report that there is no update. Such an attack could keep users from receiving
essential security updates to their dependencies. Hardening cargo against this
sort of attack is left as future work.

### An attacker with a `can-commit` key

An attacker who has compromised a key with the `can-commit` privilege could
make commits to the index, modifying the data. This would essentially revert
the system to its security properties before this RFC.

However, if a key compromise of this sort is discovered, an automated key
rotation could remove the compromised key from the set, restoring the security
properties of the system.

### An attacker with a `can-rotate` key

An attacker who has compromised a key with the `can-rotate` privilege could
rotate the key set, adding and removing keys at will. This would allow them to
take control of the index, preventing legitimate updates (such as key
rotations) from reaching users. If a `can-rotate` key were compromised, it
would likely be very disruptive to all users.

For that reason, crates.io will adopt policy that `can-rotate` keys are stored
in an offline medium, with due security precautions. As a future hardening, we
could also support a threshold signature scheme, requiring signatures from
multiple `can-rotate` keys to perform a key rotation, reducing the impact of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can also be handled out of band; you can have one (potentially more) can-rotate key that are Shamir'd between core team members. While there is a single key it is only accessible if (any) X core team members (or other keyholders) come together.

This scheme does have the vulnerability of introducing a step where they need to share their Shamir keys and decrypt the signing key, which can still be leaked somehow and have zero protection after that.

compromising a single key with this privilege.

### SHA-1

The security of this system hinges on the security of the hash function used to
implement the git repository. Unfortunately, git currently uses SHA-1, a hash
function with known flaws that allow for collision attacks. A successful
collision attack against a crate index would nullify the security benefits of
this RFC: an attacker would be able to swap out one commit for another which
would both appear to be signed.

However, there are a few mitigating factors which make this RFC worthwhile to
pursue despite these problems:

1. The SHAttered collision attack depends on the ability to control data in
both colliding objects in order to create a collision (that is, they cannot
create a collision with an arbitrary hash). Hypothetically, an attacker could
upload a crate to the registry with index metadata that they can use to create
a collision; however, this increases the difficulty significantly in comparison
to the SHAttered case.
2. Even the SHAttered case was prohibitively expensive. Breaking into one of
our administrators' homes to copy their signing key is probably cheaper.
3. Our current git host, GitHub, checks for the signs of this sort of collision
attack and would not accept an object containing a collision.

That said, we take the weakness of SHA-1 seriously, and will commit to
switching to a stronger hash function as soon as it is possible for us to do
so. In order to do that, git needs to be updated to support a new hash
function, and that upgrade needs to be supported by both GitHub (which hosts
our index) and libgit2 (which cargo uses for git operations).

# Drawbacks
[drawbacks]: #drawbacks

The primary drawback of this is that it increases the operational complexity of
managing crates.io, and the complexity of the cargo codebase. The Rust
infrastructure and dev-tools teams would be taking on the burden of maintaining
this system & provisioning and protecting secret keys. The additional security
benefits of this RFC will depend on their key management practices.

# Rationale and alternatives
[alternatives]: #alternatives

The primary alternative to a security enhancement like this would be to switch
wholesale to [TUF][tuf], a complete, designed framework for secure upgrades.

TUF as described by its spec is not backward compatible with the existing
registry index format that cargo uses. We could not simply bolt on TUF to
cargo, we would have to do a complete system change over. Such a change over is
expensive in terms of our resource allocation and in terms of the user
experience of migration: realistically, we would not be able to prioritize it
in the near future. Even if we do some day switch to TUF, it is worthwhile to
moderately improve the security of our system *now*.

The specified format for TUF target files is also not designed to be an
efficient mechanism for storing the index for a package repository like cargo.
It stores the data cargo would store in its index in a single, large JSON file.
cargo's use of a git-based index was designed on the basis of its original
authors' experience with package managers like this - specifically, the issue
of monotonic, incremental updates, and of parsing only the information for the
specific packages needed in dependency resolution. For this reason, TUF would
likely need to be modified nontrivially to support crates.io, making it an even
larger task to switch to TUF.

Given all of this, an incremental change like signing the repository git
commits seems advantageous given that we cannot perform a migration to TUF at
this time.

# Prior art
[prior-art]: #prior-art

The most important prior art is the [TUF][tuf] specification, described in the
previous section. It would be good to continue to take learnings from TUF as we
try to improve the security of our infrastructure.

# Unresolved questions
[unresolved]: #unresolved-questions

No major unresolved questions as of this time.


[tuf]: https://theupdateframework.github.io/
[rfc4880]: https://tools.ietf.org/html/rfc4880
[pbp]: https://github.com/withoutboats/pbp
[merkle]: https://en.wikipedia.org/wiki/Merkle_tree