-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge ngclient: a new client library implementation #1408
Conversation
As mentioned in the description, we don't consider this PR as a fully complete client and here is the list of our current TODOs which very likely will be extended further:
|
I'll add a few more potential TODOs that I dug from my notes:
|
Thanks for the great work, and would love to review! Give me a deadline, please... |
Thanks @trishankatdatadog! The team are focused on supporting changes targeting develop (with no dependency on this PR) for the next two weeks, but after that it would be really good to keep moving this forward. Would a deadline of Wed 9th work? |
>>> with open(timestamp_path, "rb") as f: | ||
>>> bundle.update_timestamp(f.read()) | ||
>>> except (RepositoryError, OSError): | ||
>>> pass # failure to load a local file is ok |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I understand, the RepositoryError
exception will be caught whenBadVersionNumberError
or ExpiredMetadataError
is thrown (commit 91935fa).
So, I don't think this is a simple fail to load a local file
.
Maybe we shouldn't give examples where you would want to just pass
on RepositoryError
when this exception could be a serious one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the question is, when would you want to stop the update process here? I don't think those cases exist: I don't want to stop if the local metadata is expired, corrupted or not signed with current keys for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If your local keys are broken how you are supposed to verify the new timestamp file?
How do you recover from that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The spec is fairly silent on what to demand from local metadata. But the basic rule is that we cannot demand local metadata to even exist (apart from the demand that one root version must exist).
I don't know what you mean by broken keys, but for some cases:
- local timestamp is signed by keys that are not the correct signing keys according to root: this is uncommon but not an error. The keys have likely been rotated and we must now download the new timestamp and cannot do the rollback check: version number of the trusted timestamp metadata file, if any, MUST be less than or equal to the version number of the new timestamp metadata file -- we do not have a trusted timestamp file at this point in time, so we proceed without one
- local timestamp file has been corrupted on disk or tampered with: we do not have a trusted timestamp file at this point so should proceed without one. Logging something might be a good idea but I don't think failing the process makes sense as a trusted timestamp is not required
- the local software installation is broken (e.g. a dependency got uninstalled) so e.g. signature verification fails: This would be nice to handle separately but I'm not currently convinced we can meaningfully separate this case from e.g. an attacker messing with the signatures and feeding bad data to us and the signature verification failing weirdly because of that. Again, if local metadata fails to load, we should proceed without it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- local timestamp is signed by keys that are not the correct signing keys according to root: this is uncommon but not an error. The keys have likely been rotated and we must now download the new timestamp and cannot do the rollback check: version number of the trusted timestamp metadata file, if any, MUST be less than or equal to the version number of the new timestamp metadata file -- we do not have a trusted timestamp file at this point in time, so we proceed without one
- local timestamp file has been corrupted on disk or tampered with: we do not have a trusted timestamp file at this point so should proceed without one. Logging something might be a good idea but I don't think failing the process makes sense as a trusted timestamp is not required
- the local software installation is broken (e.g. a dependency got uninstalled) so e.g. signature verification fails: This would be nice to handle separately but I'm not currently convinced we can meaningfully separate this case from e.g. an attacker messing with the signatures and feeding bad data to us and the signature verification failing weirdly because of that. Again, if local metadata fails to load, we should proceed without it.
You mean the updating process will start all over again and we will download from version 1 of the timestamp
right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is nothing to start over. The whole timestamp loading process is:
- load local timestamp (if it's available, readable and valid)
- load remote timestamp (latest version), validating against the already loaded timestamp if it was loaded.
- If the remote timestamp was not valid for any reason, abort
in the beginning of load_timestamp()... | ||
* some metadata interactions might work better in Metadata itself | ||
* Progress through Specification update process should be documented | ||
(not sure yet how: maybe a spec_logger that logs specification events?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have an issue for this one?
Seems useful for debugging.
The suggestion for spec_logger.log
or something sounds reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I don't have an issue... I'm still on the fence about it -- it sounds like a nice idea but it does add quite a few lines of code that maybe aren't necessary (the fact that something is an item in the spec might not mean it's worth logging)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine logging will be useful for the bigger more time-consuming steps.
For example, logging when the update process is finished for each of the metadata classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am sorry I left two reviews according to GitHub.
I thought I had just left comments there, but somehow I shared it as a review...
I like the MetadataBundle
.
It's simple and easy to read. Can't wait to see what will happen when we update the rest of the code!
There are one or two important comments I want us to answer, the others are style improvements.
raise ValueError("Call is valid only on delegator metadata") | ||
|
||
if role is None: | ||
raise ValueError(f"Delegated role {role_name} not found") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you allow calling verify_with_threshold
when delegator.signed
is Targets
, but there are no delegations?
Or do you make that check somewhere else?
Then, you would end up with role
which is None, and throw that exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do allow that on purpose, and I think the exception is correct: caller is trying to verify a role that isn't delegated by delegator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it's a valid allowed situation to have Targets without delegations and thus call verify_with_threshold
which will throw that exception.
Then, you should add a check inside update_targets
which doesn't call verify_with_threshold
on empty delegations or catch the exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, we should polish this code in the PR for #1306 (I plan to do one soon) -- it's here just as a placeholder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you comment in the code where we should handle the exception because I am not sure what this refers to?
If there are no delegations, then how does the client end up calling update_targets() for a delegate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can call it here: https://github.com/jku/tuf/blob/f057d57e6aa2cc1bad1a153343fc6335c9c5c11c/tuf/ngclient/_internal/metadata_bundle.py#L445 line 445 in your current changes.
What's interesting is that you aren't testing if we support targets
with no delegations
field.
The reason is that you don't call update_delegated_targets()
in your tests with a target file without delegations
.
I thought you should been testing this in test_metadata_bundle.py
when updating role2.json
like this:
with open(os.path.join(repo_dir, "role2.json"), "rb") as f:
bundle.update_delegated_targets(f.read(), "role2", "role1")
but after a quick check, I found that role2.json
had delegations
, but delegations
itself consists only of empty keys
dict and roles
list.
I asked if that was the intention when marking delegations
as optional here.
When I removed delegations
from role2.json
I got:
tuf.exceptions.UnsignedMetadataError: New role2 is not signed by role1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I removed
delegations
fromrole2.json
I got:tuf.exceptions.UnsignedMetadataError: New role2 is not signed by role1
Did you resign the metadata after modifying? If you didn't, this seems like the correct thing to do -- signature in role2 is no longer valid
I'm not saying there isn't a bug here (and we definitely should have better ways of modifying and resigning the data) but I just don't see the bug... The delegations in the delegated metadata are not processed at all (apart from what happens in Metadata.from_bytes()
) during update_delegated_targets()
.
The delegations in the delegator metadata are processed -- but in that case calling update_delegated_targets()
on metadata that does not have delegations would be a bug and would result in a ValueError.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I had forgotten to resign the metadata. I did the experiment again.
The delegations in the delegated metadata are not processed at all (apart from what happens in Metadata.from_bytes()) during update_delegated_targets().
The delegations in the delegator metadata are processed -- but in that case calling update_delegated_targets() on metadata that does not have delegations would be a bug and would result in a ValueError.
Yes, you are right and I verify that after debugging it.
Probably there is no bug here and it's me spending too much time on optional attributes. :D
unique_keys = set() | ||
for keyid in role.keyids: | ||
key_dict = keys[keyid].to_dict() | ||
key, dummy = sslib_keys.format_metadata_to_key(key_dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: maybe you can just key, _ = sslib_keys.format_metadata_to_key(key_dict)
instead of giving a name to the second value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree but our pylint configuration does not currently allow that. Possibly we should change that (Google style guide and their pylintrc does allow it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, Teodora noticed that this call should not be done: this is being fixed in #1417
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, do we need to merge 1417 in order to merge this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I consider the implementation of verify_with_threshold in this PR a placeholder: I don't mind it if it's not perfect, and I don't think waiting on the replacement is necessary.
I suggest we polish #1423 and #1436 as much as we want and use the result here when they are ready, but don't block this PR on those two
self._bundle["root"] = new_root | ||
logger.debug("Updated root") | ||
|
||
def root_update_finished(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to be able to reset root_update_finished
back to False
at some point?
Most likely it's reasonable to assume that a MetadataBundle
instance won't live beyond one root version, but I wanted to make sure I understand the assumption.
If this assumption is true, then do we want to document that somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, idea is that if you ever want to "refresh again", you create a new bundle and start from scratch -- or in other words: updating a specific piece of metadata is only possible until you have loaded any "depending metadata" (this means delegated metadata but also the special cases for top-level metadata where they each depend on the previous one in order timestamp -> snapshot -> targets)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarifications!
except DeserializationError as e: | ||
raise exceptions.RepositoryError("Failed to load timestamp") from e | ||
|
||
if new_timestamp.signed.type != "timestamp": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized that we are checking Timestamp.type
a couple of times:
- once when we call
Metadata.from_bytes()
->deserializer.deserialize()
->Metadata.from_dict()
and there we check that_type
is one from the four metadata types. - from
Metadata.from_dict()
->Timestamp.from_dict()
->Signed._common_fields_from_dict()
and there we checkif _type != cls._signed_type:
- is this check
I understand why you made that check here, I am just wondering do we need all of them?
PS: This is true for all of the top metadata classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question:
Number 1 is the dispatcher, we need to do this somewhere.
Number 2 is just a sanity check IIRC: could be removed but it doesn't hurt anyone
Number 3 we do need but you are right that it looks like we could do better: I wish I could do something like this:
try:
new_timestamp = Metadata[Timestamp].from_bytes(data)
except DeserializationError as e:
raise exceptions.RepositoryError("Failed to load timestamp") from e
Where metadata was a Generic type over (Root/Timestamp/Snapshot/Targets) and the constructor used above would
- fail if the type is not
Timestamp.type
- type annotate new_timestamp.signed correctly as
Timestamp
I have no idea if this is possible: I've done an experiment and managed to get close but not quite there.
A simpler alternative is adding an optional argument to the factory constructors: Metadata.from_bytes(data, type=Timestamp)
which would raise if the Type is not correct (but would do nothing to fix the annotation issue)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-> #1433
new_timestamp.signed.meta["snapshot.json"].version | ||
< self.timestamp.signed.meta["snapshot.json"].version | ||
): | ||
# TODO not sure about the correct exception here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want another exception here?
I thought that this is the correct one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes I guess that's fine
meta = self.timestamp.signed.meta["snapshot.json"] | ||
|
||
# Verify against the hashes in timestamp, if any | ||
hashes = meta.hashes or {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: This line made me question the default value for MetaFile.hashes
.
Maybe we can use an empty dictionary instead of None
?
Then, you wouldn't have to use or.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it but doing that in Metadata means the content changes when you do deserialize + serialize: no hashes becomes empty hashes...
digest_object.update(data) | ||
observed_hash = digest_object.hexdigest() | ||
if observed_hash != stored_hash: | ||
# TODO: Error should derive from RepositoryError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fixed in 91935fa.
* exceptions are not final: the idea is that client could just handle | ||
a generic RepositoryError that covers every issue that server provided | ||
metadata could inflict (other errors would be user errors), but this is not | ||
yet the case |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if this is not the case anymore.
Do you still throw exceptions that aren't RepositoryError
or don't inherit RepositoryError
?
The only one I see is RuntimeError
and ValueError
, but we don't want to document them right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are probably correct but I also think another close look would be good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at least hash verify failure is not a repositoryerror yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, well this shouldn't stop us from merging this.
We have an issue where we document this step, so I will consider this comment as resolved.
tuf/ngclient/updater.py
Outdated
After the client has retrieved the target information for those targets | ||
they are interested in updating, they would call this method to | ||
determine which targets have changed from those saved locally on disk. | ||
All the targets that have changed are returns in a list. From this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the targets that have changed are returns in a list. From this | |
All the targets that have changed are returned in a list. From this |
tests/test_updater_ng.py
Outdated
import json | ||
import tracemalloc | ||
|
||
if sys.version_info >= (3, 3): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this check given that we support Python3.6+.
if not verify_with_threshold(delegator, role_name, new_delegate): | ||
raise exceptions.UnsignedMetadataError( | ||
f"New {role_name} is not signed by {delegator_name}", | ||
new_delegate, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might be more readable if we called it new_delegate_name
instead of role_name
-- so the connection between the name and new_delegate metadata is clear.
Of course, if Metadata object included the name (#1425), this would be even clearer:
if not verify_with_threshold(delegator, new_delegate):
raise exceptions.UnsignedMetadataError(
f"New {new_delegate.name} is not signed by {delegator.name}",
new_delegate,
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 completely agree with using variable names that make the connection clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice work, everyone, I love how the new Updater is a few hundred lines, like my PoC! 👏🏽
A few overall comments:
- Could use more typing hints.
- I feel the Metadata API could be more intuitive, but that's a different PR.
- A few functions are a bit too long. Please consider refactoring.
- Please double-check against the spec and another implementation (e.g., my PoC) to check whether you missed anything.
Thank you very much again, great work, the ngclient really lives up to its name! Very nice and easy to read like Joshua promised.
|
||
"""TUF client bundle-of-metadata | ||
|
||
MetadataBundle keeps track of current valid set of metadata for the client, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I've mentioned before, I think it is a fantastic idea to use a state machine to keep track of the update workflow, but I do think MetadataBundle is not the best name for this. Could we please rename to something much more descriptive? Maybe something like Workflow? Not the best, but you get the idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah we can, I've not been totally happy with the name. A lot of the code is indeed about implementing the update workflow so Workflow makes some sense...
The core idea (or at least the big change) I was starting from is that the currently valid set of metadata must be considered a unit. Single metadata file cannot usually be considered "valid": validity can only be decided for the whole set, and the set must be valid through the update process, at every step. I was trying to figure out a name for this set (that could contain just root metadata or possibly dozens of individual metadata files)
Would be nice to get more opinions, I've been looking at this for way too long to be able to switch viewpoints now...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why no one likes my TrustedMetadata
option 😄
This is how the specification refers to metadata files that have been verified by the client and persisted and MetadataBundle
is implemented as a mapping class rather than a workflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the name TrusteMetadata
, but MetadataBundle
is both a mapping and a workflow class, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah Trusted would be nice in the name but I don't like TrustedMetadata because that sounds like a type of Metadata (a class we already have). The component does contain a lot of the update workflow, but in a way that workflow is just a side effect of keeping track of our collection of trusted metadata...
tuf/ngclient/README.md
Outdated
https://theupdateframework.github.io/specification/latest/#detailed-client-workflow) | ||
|
||
**tuf.ngclient.FetcherInterface** is an abstract class that client | ||
implementers can optionally use to integrate with their own |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
implementers can optionally use to integrate with their own | |
implementers can implement a concrete class of in order to reuse their own |
tuf/ngclient/README.md
Outdated
|
||
**tuf.ngclient.FetcherInterface** is an abstract class that client | ||
implementers can optionally use to integrate with their own | ||
network/download infrastructure -- a Requests-based implementation is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
network/download infrastructure -- a Requests-based implementation is | |
networking/download libraries -- a Requests-based implementation is |
tuf/ngclient/README.md
Outdated
This package: | ||
* Aims to be a clean, easy-to-validate reference client implementation | ||
written in modern Python | ||
* At the same time aims to be the library choice for anyone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* At the same time aims to be the library choice for anyone | |
* At the same time aims to be the library of choice for anyone |
tuf/ngclient/README.md
Outdated
* At the same time aims to be the library choice for anyone | ||
implementing a TUF client in Python: light-weight, easy to integrate | ||
and with minimal required dependencies | ||
* Is still under development but planned to become the default client |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Is still under development but planned to become the default client | |
* Is still under development but is planned to become the default client |
def __init__( | ||
self, | ||
repository_dir: str, | ||
metadata_base_url: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea to separate metadata vs targets.
return None | ||
|
||
|
||
def _check_file_length(file_object, trusted_file_length): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. I get that these are helper functions, but any reason not to fold them into the class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strict_required_length=False, | ||
) | ||
|
||
def _load_local_metadata(self, rolename: str) -> bytes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels familiar :)
self._bundle["root"] = new_root | ||
logger.debug("Updated root") | ||
|
||
def root_update_finished(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do you check for revoking timestamp/snapshot to recover from fast-forward attacks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't:
- because we are strict about the order of loading different metadata, we cannot ever have timestamp or snapshot loaded at this point
- after this point, we cannot load local timestamp or snapshot if they are no longer signed by the correct (rotated) keys
So I believe there's just nothing to revoke. The spec does also say delete the trusted timestamp and snapshot metadata files. We don't do that either currently:
- the request is a bit odd: the metadata files by definition are not trusted anymore: the keys have been rotated and the old timestamp and snapshot are no longer valid
- There's just no way to load those files into the bundle anymore
We could add the file removal of course... and if we don't we should definitely document why. I've been reluctant to add file removing unless really needed because it feels like removing evidence just as something maybe going wrong
Something else we don't make clear is the difference to current implementation: we have no "current" and "previous" metadata: Currently we store a single copy of each roles metadata (new metadata gets written as soon as it's validated by the bundle). There's probably value in storing some versioned metadata (like every Root) but nothing in the spec seems to require storing "current" and "previous".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could add the file removal of course... and if we don't we should definitely document why.
This is a good discussion, we need some action to come from this. Even if it's just an issue capturing this intent.
self._bundle.update_delegated_targets(data, role, parent_role) | ||
self._persist_metadata(role, data) | ||
|
||
def _preorder_depth_first_walk(self, target_filepath) -> Dict: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stack-based preorder DFS. Nice, but too long. Could we refactor for simplicity and to check for correctness?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#1398 : This is still an almost direct copy-paste from current client. We've been avoiding touching it so far since it's so well isolated from the code that we did touch and also because having type annotations will make following the code so much easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Draft PR to improve this code in #1463
Thanks a lot for review
Agreed, issue #1409, I suggest we do this after merge
If you have something that you manage to write down in an issue, that would be great... I feel like there is room for improvement but personally the things I am able to find and express are fairly small details. If there are bigger ideas and/or complaints I think now is a great time to look at them: We don't have too much code built yet on top of it but do have some experience already.
For the bundle: I agree the update_* methods are long but I don't want to split them. There are improvements we can make though (like #1433 removing the type check and making the hash/length checks tighter by implementing in Metadata API). For the Updater, I think the longer functions are the ones we have not refactored yet, full ack here.
👍 |
There's a lot of great comments here, much of which has been filed as issues to tackle after the code has landed. Are there any blockers to merging this in its current state? Obviously a rebase and an approving review are required. |
Nope, provided all the tests for the old Updater continue to pass for the new Updater, and we make it clear it's not production-ready. |
Potential work left:
The test issue is a bit tricky: the existing tests are not blackbox tests (in some places not even close) and we have changed API slightly (e.g. with mirrors config) so getting the old tests to run is not trivial (Teodora tried this so can comment with details if needed). I suggest we merge with the tests we have included, acknowledge that they are inadequate, and start improving them as a priority. |
About mirrors: is it possible to ignore it in a backwards-compatible way? When only one mirror, just go with the one-repo workflow, but with multiple mirrors, then thrown an error with a message. |
I don't think I'm interested in trying to preserve compatibility like that:
|
Fair, no need to keep the old API happy. We are preserving the old codebase for now anyway. |
Some more brainstorming produced the option
|
But it still doesn't hint at any state-keeping about the client update workflow. |
This is true. I think I mentioned this earlier (but it's buried in the nested comments so repeating): Based on that I'd rather emphasize "trusted collection of metadata" than "client workflow", even though both are correct and descriptive. |
Two changes since last comment:
From my POV all open issues are now handled or filed as issues (I thought we had one for the tests but filed #1462 since I could not find one). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the delay, but I finally gave this a complete review. Great work folks!
I started out by reading the existing discussion threads and checking to see whether all discussion items are resolved. I believe they are, apart from one where @trishankatdatadog noticed the code does not implement 5.3.11 which @jku indicated was intentional, but that it needs to be documented why it's not implemented per spec or implemented per spec.
I then did a review of the code, which looks great 👍 . I made only a few minor suggestions on strings/comments/docstrings.
Finally, I did a side-by-side comparison of the detailed client workflow to the new code, which wasn't an enormous task 🎉 – another indication of the solid design here. When doing so I noticed that:
- When we compare versions for rollback attack checks, the spec says "version number of the [...] metadata file, if any, MUST be less than or equal to the version number of [...]", but the code checks
<
, not<=
. I don't think this is a major issue, but it's something I was able to notice thanks to the quality of the new implementation. Perhaps worth a comment at least? - The current updater implementation does not handle consistent snapshots in the target file downloading (
Updater.download_target()
). - Step 5.3.11 is not implemented, as discussed above.
I'm OK with handling all of these as follow-ons, so long as we have issues (or PRs) created before merging this.
from tuf.ngclient._internal import download, metadata_bundle, requests_fetcher | ||
from tuf.ngclient.fetcher import FetcherInterface | ||
|
||
# Globals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and there's a draft PR to improve things here at #1470
__metaclass__ = abc.ABCMeta | ||
|
||
@abc.abstractmethod | ||
def fetch(self, url, required_length): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Annotations added in draft PR #1448
>>> # load local timestamp, then update from remote | ||
>>> try: | ||
>>> with open(timestamp_path, "rb") as f: | ||
>>> trusted_set.update_timestamp(f.read()) | ||
>>> except (RepositoryError, OSError): | ||
>>> pass # failure to load a local file is ok | ||
>>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>>> # load local timestamp, then update from remote | |
>>> try: | |
>>> with open(timestamp_path, "rb") as f: | |
>>> trusted_set.update_timestamp(f.read()) | |
>>> except (RepositoryError, OSError): | |
>>> pass # failure to load a local file is ok | |
>>> |
loading a local copy of timestamp isn't required by the spec, should we simplify this example by removing that step?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the spec is basically silent about loading anything local (except the trusted root), as far as I can tell:
- it still expects us to compare new files to "trusted metadata" for rollback protection. I think loading a local copy is the logical way to find "trusted metadata".
- I think there's possibly even a case to be made for loading expired local metadata for purposes of rollback prevention but since the spec explicitly mentions in the root case that expired metadata is fine, and does not mention that in other cases, I didn't seriously consider that. Still this may be a discussion topic ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed my mind on this while doing the review and should have deleted this comment.
- it still expects us to compare new files to "trusted metadata" for rollback protection. I think loading a local copy is the logical way to find "trusted metadata".
Exactly this. The spec implies there's loading of snapshot and timestamp to check for rollback attacks and targets to check for freeze attacks. I made a note while reviewing this PR to do something about this for the spec.
You have a good point that expired metadata is still valid as a source of information for protection against rollback attacks.
self._bundle.update_delegated_targets(data, role, parent_role) | ||
self._persist_metadata(role, data) | ||
|
||
def _preorder_depth_first_walk(self, target_filepath) -> Dict: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Draft PR to improve this code in #1463
|
||
def _visit_child_role(child_role: Dict, target_filepath: str) -> str: | ||
""" | ||
<Purpose> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Old style docstring here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This gets refactored in #1463
else: | ||
target_base_url = _ensure_trailing_slash(target_base_url) | ||
|
||
full_url = parse.urljoin(target_base_url, targetinfo["filepath"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per 5.7.3 of the detailed client workflow, if consistent snapshots are enabled for the repository then we should prefix the target's filename with the hash(es) when downloading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops! Filed #1479
I believe the checks are actually as specified: it's just that spec defines it as
and we check for
Good call , filed #1479
There was a comment explaining it in |
Try to make errors derive from RepositoryError if they can be the result of malicious or malfunctioning remote repository: The idea here is that client can handle just RepositoryError instead of individual errors (as it cannot do anything about any of them). Also improve variable naming. This is backwards compatible. Signed-off-by: Jussi Kukkonen <[email protected]>
Start building the next-gen client: Copy existing components from the current client. All of these files have some changes compared to the already existing copies (because ngclient uses the same linting rules as Metadata API). * download.py is likely to see major changes in the future. * requests_fetcher is likely to see some minor changes (like allowing compression) Signed-off-by: Jussi Kukkonen <[email protected]>
* Use the same rules as tuf/api * omit ngclient from coverage limits for now: theupdateframework#1309 Signed-off-by: Jussi Kukkonen <[email protected]>
This is a new client library implementation using the Metadata API (the only "old" file used is exceptions.py) * Functional but largely still untested * Requires work before it can replace tuf.client but should be good enough to include in the repo The major changes compared to current client so far are: * Use of Metadata API * Major simplification in the way downloads are handled due to removing mirrors support * Separating tracking of valid metadata into a separate component * There's no MultiRepoUpdater We do not expect other major changes (in the sense of moving large amounts of code) but do plan to possibly improve the client API. The API has already changed so this is not going to be a 1:1 compatible implementation, but porting should not be difficult. Signed-off-by: Jussi Kukkonen <[email protected]>
This testing lacks coverage but demonstrates the happy cases. Signed-off-by: Jussi Kukkonen <[email protected]>
Improve the README text. Signed-off-by: Teodora Sechkova <[email protected]>
Remove unused and outdated imports in test_updater_ng.py Signed-off-by: Teodora Sechkova <[email protected]>
Remove outdated comments. Add explanations to non-obvious cases. Signed-off-by: Teodora Sechkova <[email protected]>
For consistency with the rest if the checks. Signed-off-by: Teodora Sechkova <[email protected]>
Document why deleting the timestamp and snapshot files is not needed to recover from a fast-forward attack. Signed-off-by: Teodora Sechkova <[email protected]>
TrustedMetadataSet is a long name but * it better describes the main feature * the name isn't used in too many places Change the variable names "bundle" -> "trusted_set" Signed-off-by: Jussi Kukkonen <[email protected]>
Signed-off-by: Jussi Kukkonen <[email protected]>
* updated_targets() both takes and returns a list * download_target() argument can come from either updated_targets() or get_one_valid_targetinfo() Signed-off-by: Jussi Kukkonen <[email protected]>
changes since last push:
All discussions fixed here have been marked resolved |
🤦 whoops, this LGTM.
Thanks.
Updated comment looks good, thanks for doing that. Given the discussion about trusting expired metadata for rollback attack prevention, we may just want to go ahead and delete the old files anyway? |
The spec 5.3.11 talks about deleting "trusted metadata" -- what it really means is metadata that is not signed by current root: We'll never load metadata like that. What we might want to use for rollback prevention is correctly signed, but expired metadata I just don't see the point in deleting the files -- our local files may get mangled by random bit flips, malware, users, cats sitting on keyboards: we just need to accept that they may not be valid. |
Signed-off-by: Jussi Kukkonen <[email protected]>
I believe you are right, but it took me a bit of time to reason it through. 🤔 |
This is something we've been working in experimental-client branch so far and would like to merge to develop now. It's a new client library implementation:
The implementation is roughly feature complete and ready for a test-drive.
There are still TODO-items that we'd like to handle before this could replace the current client, but we feel this would be a good time to merge so future work can be shared and is visible to everyone. Teodora has promised to post a list of things currently on our TODO-list here.
On the PR itself:
If others would rather see the full history, I can make this PR from the experimental-client branch too.