-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add checkpoint support to Pulp core and file #6245
base: main
Are you sure you want to change the base?
Conversation
I think we need docs as part of this PR. Maybe some on how plugin writers can enable this feature for their plugins in /docs/dev/learn/subclassing and then some user docs somewhere in /docs/user somewhere. |
3049844
to
38ead76
Compare
pulpcore/app/migrations/0127_distribution_checkpoint_publication_checkpoint.py
Outdated
Show resolved
Hide resolved
ac5f9eb
to
38ead76
Compare
d778a95
to
19ceb1e
Compare
Introduce a checkpoint field for Publication and Distribution models. Handle serving checkpoint Publications via checkpoint Distributions. Protect checkpoint Publications' RepositoryVersions from cleanup. Enable checkpoint support in pulp_file. closes pulp#6244
19ceb1e
to
39f84ad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, this is quite a lot. But it's a huge pr also...
The next step is to create checkpoint publications. Only publications marked as checkpoint will be | ||
served from the checkpoint distribution. Checkpoint publications can only be created using the | ||
repository's latest version. Repository versions of the distributed checkpoint publications will be | ||
protected from the `retain_repo_versions` cleanup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really just a suggestion:
protected from the `retain_repo_versions` cleanup. | |
protected from the automatic cleanup defined by `retain_repo_versions`. |
base path of any of the repository's checkpoint distributions. | ||
|
||
```bash | ||
http :24816/pulp/content/checkpoint/myfile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer something "more user oriented" here. Whatever that means. How about:
http :24816/pulp/content/checkpoint/myfile | |
http https://pulp.example/pulp/content/checkpoint/myfile |
@@ -98,12 +99,13 @@ class Publication(MasterModel): | |||
|
|||
complete = models.BooleanField(db_index=True, default=False) | |||
pass_through = models.BooleanField(default=False) | |||
checkpoint = models.BooleanField(default=False, editable=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the editable actually doing here? Is it picked up by DRF at all?
https://docs.djangoproject.com/en/5.1/ref/models/fields/#editable
@@ -159,6 +165,10 @@ def delete(self, **kwargs): | |||
# It's possible for errors to occur before any publication has been completed, | |||
# so we need to handle the case when no Publication exists. | |||
try: | |||
if self.checkpoint: | |||
base_paths |= Distribution.objects.filter( | |||
checkpoint=self.checkpoint, repository=self.repository_version.repository |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checkpoint=self.checkpoint, repository=self.repository_version.repository | |
checkpoint=True, repository=self.repository_version.repository |
raise serializers.ValidationError( | ||
_("Checkpoint can only be created for the repository's latest version") | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I understand this is a deliberate limitation to make the whole checkpoint in time logic explicit.
How about we just never allow repository_version
and checkpoint=True
and instead say you need to specify repository
and checkpoint=True
?
elif checkpoint and ( | ||
not repository_provided or publication_provided or repository_version_provided | ||
): | ||
raise serializers.ValidationError( | ||
_("The 'checkpoint' attribute may only be used with the 'repository' attribute.") | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -164,6 +186,7 @@ class Handler: | |||
] | |||
|
|||
distribution_model = None | |||
checkpoint_ts_format = "%Y%m%dT%H%M%SZ" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we expect this to be changed when subclassing the handler?
I think we can make it a global constant...
distro.base_path = f"{base_path}/{request_timestamp_str}" | ||
distro.repository = None | ||
distro.publication = checkpoint_publication | ||
return distro |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooooof!
This is returning a modified but unsaved version of the distribution. I can see how it works, but will we still understand that in six months?
|
||
if distro_object.checkpoint: | ||
return cls._handle_checkpoint_distribution(distro_object, original_path) | ||
return distro_object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels (to me) like the wrong location to hook that logic.
Can you take a closer look at _match_and_stream
?
Or should it even be part of the Distribution.content_handler
?
Down there we already have parsed rel_path
for example.
Yes I know, the whole structure of the handler is already pretty hard to grasp.
def create_publication(repo, checkpoint): | ||
content = file_content_unit_with_name_factory(str(uuid.uuid4())) | ||
task = file_bindings.RepositoriesFileApi.modify( | ||
repo.pulp_href, {"add_content_units": [content.pulp_href]} | ||
).task | ||
monitor_task(task) | ||
repo = file_bindings.RepositoriesFileApi.read(repo.pulp_href) | ||
pub = gen_object_with_cleanup( | ||
file_bindings.PublicationsFileApi, | ||
{"repository_version": repo.latest_version_href, "checkpoint": checkpoint}, | ||
) | ||
sleep(1) | ||
return pub | ||
|
||
# setup | ||
repo = file_repository_factory() | ||
distribution = file_distribution_factory(repository=repo.pulp_href, checkpoint=True) | ||
|
||
pub_0 = create_publication(repo, False) | ||
pub_1 = create_publication(repo, True) | ||
pub_2 = create_publication(repo, False) | ||
pub_3 = create_publication(repo, True) | ||
pub_4 = create_publication(repo, False) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One suggestion to make the test more descriptive:
If we move this setup into a module local but class scoped fixture, we can turn this test function into a class and split the assertions into individual tests. Think TestCheckpointDistribution.test_invalid_ts_returns_404
, ...test_arbitrary_timestamp_is_redirected
, ...test_base_path_lists_checkpoints
, ...
The class is crucial to share the single repository setup between the individual tests, by the nature of our factory fixtures.
This PR adds the support for checkpoints in Pulp. This is related to this proposal . Publications can be marked as checkpoints at the creation time. Checkpoint publications are served through checkpoint distributions. Each repository can have 0-n checkpoint distributions, all serving the same content (i.e. the checkpoint publications).
The change adds a new field "checkpoint" to both the Distribution and Publication models. New publication can be made checkpoint publications by setting the checkpoint field. Checkpoint distributions can be created by setting the checkpoint field on the Distribution model and will serve all checkpoint publications. It's up to plugin writers to expose the checkpoint fields on their respective serializers.
This PR also enables the checkpoint support for pulp_file.
closes #6244