Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add checkpoint support to Pulp core and file #6245

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Moustafa-Moustafa
Copy link

This PR adds the support for checkpoints in Pulp. This is related to this proposal . Publications can be marked as checkpoints at the creation time. Checkpoint publications are served through checkpoint distributions. Each repository can have 0-n checkpoint distributions, all serving the same content (i.e. the checkpoint publications).
The change adds a new field "checkpoint" to both the Distribution and Publication models. New publication can be made checkpoint publications by setting the checkpoint field. Checkpoint distributions can be created by setting the checkpoint field on the Distribution model and will serve all checkpoint publications. It's up to plugin writers to expose the checkpoint fields on their respective serializers.
This PR also enables the checkpoint support for pulp_file.

closes #6244

@daviddavis
Copy link
Contributor

I think we need docs as part of this PR. Maybe some on how plugin writers can enable this feature for their plugins in /docs/dev/learn/subclassing and then some user docs somewhere in /docs/user somewhere.

Introduce a checkpoint field for Publication and Distribution models.
Handle serving checkpoint Publications via checkpoint Distributions.
Protect checkpoint Publications' RepositoryVersions from cleanup.
Enable checkpoint support in pulp_file.

closes pulp#6244
Copy link
Member

@mdellweg mdellweg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this is quite a lot. But it's a huge pr also...

The next step is to create checkpoint publications. Only publications marked as checkpoint will be
served from the checkpoint distribution. Checkpoint publications can only be created using the
repository's latest version. Repository versions of the distributed checkpoint publications will be
protected from the `retain_repo_versions` cleanup.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really just a suggestion:

Suggested change
protected from the `retain_repo_versions` cleanup.
protected from the automatic cleanup defined by `retain_repo_versions`.

base path of any of the repository's checkpoint distributions.

```bash
http :24816/pulp/content/checkpoint/myfile
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer something "more user oriented" here. Whatever that means. How about:

Suggested change
http :24816/pulp/content/checkpoint/myfile
http https://pulp.example/pulp/content/checkpoint/myfile

https://en.wikipedia.org/wiki/.example

@@ -98,12 +99,13 @@ class Publication(MasterModel):

complete = models.BooleanField(db_index=True, default=False)
pass_through = models.BooleanField(default=False)
checkpoint = models.BooleanField(default=False, editable=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the editable actually doing here? Is it picked up by DRF at all?

https://docs.djangoproject.com/en/5.1/ref/models/fields/#editable

@@ -159,6 +165,10 @@ def delete(self, **kwargs):
# It's possible for errors to occur before any publication has been completed,
# so we need to handle the case when no Publication exists.
try:
if self.checkpoint:
base_paths |= Distribution.objects.filter(
checkpoint=self.checkpoint, repository=self.repository_version.repository
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
checkpoint=self.checkpoint, repository=self.repository_version.repository
checkpoint=True, repository=self.repository_version.repository

Comment on lines +48 to +50
raise serializers.ValidationError(
_("Checkpoint can only be created for the repository's latest version")
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I understand this is a deliberate limitation to make the whole checkpoint in time logic explicit.
How about we just never allow repository_version and checkpoint=True and instead say you need to specify repository and checkpoint=True?

Comment on lines +328 to +333
elif checkpoint and (
not repository_provided or publication_provided or repository_version_provided
):
raise serializers.ValidationError(
_("The 'checkpoint' attribute may only be used with the 'repository' attribute.")
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -164,6 +186,7 @@ class Handler:
]

distribution_model = None
checkpoint_ts_format = "%Y%m%dT%H%M%SZ"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect this to be changed when subclassing the handler?
I think we can make it a global constant...

Comment on lines +422 to +425
distro.base_path = f"{base_path}/{request_timestamp_str}"
distro.repository = None
distro.publication = checkpoint_publication
return distro
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooooof!
This is returning a modified but unsaved version of the distribution. I can see how it works, but will we still understand that in six months?

Comment on lines +352 to +355

if distro_object.checkpoint:
return cls._handle_checkpoint_distribution(distro_object, original_path)
return distro_object
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels (to me) like the wrong location to hook that logic.
Can you take a closer look at _match_and_stream?
Or should it even be part of the Distribution.content_handler?
Down there we already have parsed rel_path for example.

Yes I know, the whole structure of the handler is already pretty hard to grasp.

Comment on lines +92 to +115
def create_publication(repo, checkpoint):
content = file_content_unit_with_name_factory(str(uuid.uuid4()))
task = file_bindings.RepositoriesFileApi.modify(
repo.pulp_href, {"add_content_units": [content.pulp_href]}
).task
monitor_task(task)
repo = file_bindings.RepositoriesFileApi.read(repo.pulp_href)
pub = gen_object_with_cleanup(
file_bindings.PublicationsFileApi,
{"repository_version": repo.latest_version_href, "checkpoint": checkpoint},
)
sleep(1)
return pub

# setup
repo = file_repository_factory()
distribution = file_distribution_factory(repository=repo.pulp_href, checkpoint=True)

pub_0 = create_publication(repo, False)
pub_1 = create_publication(repo, True)
pub_2 = create_publication(repo, False)
pub_3 = create_publication(repo, True)
pub_4 = create_publication(repo, False)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One suggestion to make the test more descriptive:
If we move this setup into a module local but class scoped fixture, we can turn this test function into a class and split the assertions into individual tests. Think TestCheckpointDistribution.test_invalid_ts_returns_404, ...test_arbitrary_timestamp_is_redirected, ...test_base_path_lists_checkpoints, ...

The class is crucial to share the single repository setup between the individual tests, by the nature of our factory fixtures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add checkpoint support to Pulp
3 participants