Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEV-1125: duplicate holdings cleanup #302

Merged
merged 2 commits into from
May 23, 2024

Conversation

aelkiss
Copy link
Member

@aelkiss aelkiss commented May 22, 2024

Context

Many clusters have duplicate holdings for reasons that have previously been discussed; see DEV-1125

Description

This cleans up those clusters by iterating through them one by one, grouping them by update key the same way that find_old_holdings does (https://github.com/hathitrust/holdings-backend/blob/main/lib/clustering/cluster_holding.rb#L86-L91) and retaining the most recent in each equality group.

@mwarin Mainly I'm interested if you can think of any other tests we should be doing to make sure this does what we want.

This can be reviewed now; once approved we should wait to merge it until after #301

@aelkiss aelkiss requested a review from mwarin May 22, 2024 19:51
@coveralls
Copy link

coveralls commented May 22, 2024

Coverage Status

coverage: 95.06% (+0.002%) from 95.058%
when pulling 96097f3 on DEV-1125-duplicate-holdings-cleanup
into 84e1923 on main.

README.md Outdated
@@ -18,7 +18,7 @@ bash bin/setup/setup_dev.sh

## Running the tests

`docker-compose run --rm dev bundle exec rspec`
`docker-compose run --rm test`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might as well make it hyphenless: docker compose

Copy link
Contributor

@mwarin mwarin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm mostly on board with the functional changes, but I don't understand how you would execute the script as written.

I was confused seeing some of (all?) the small unrelated (?) changes I already approved in PR #300 . Could those have been rebased away?

Not going to let my confusion stand in the way of an APPROVE, though.

@@ -0,0 +1,68 @@
# frozen_string_literal: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zero comments here. How do I run this?

bundle exec ruby bin/cleanup_duplicate_holdings.rb does not work.
It does run fine from it's spec.

Given that it does not require Services or Cluster, should this be run from inside phctl pry?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, there's probably some missing stuff here that the specs hide. I'll address that and add a comment as to the purpose here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

bin/setup/setup_dev.sh Outdated Show resolved Hide resolved
docker-compose.yml Outdated Show resolved Hide resolved
@@ -0,0 +1,140 @@
# frozen_string_literal: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bundle exec rspec spec/cleanup_duplicate_holdings_spec.rb runs well.
I don't see any missing tests.

@aelkiss
Copy link
Member Author

aelkiss commented May 23, 2024

Right, this needs to be rebased...

@aelkiss
Copy link
Member Author

aelkiss commented May 23, 2024

I should have specified that only 90ccd66 needed to be reviewed and the other changes were covered in #300 or #301

aelkiss added 2 commits May 23, 2024 13:52
Iterates through clusters and cleans them up one by one
* add brief documentation
* ensure it runs
@aelkiss aelkiss force-pushed the DEV-1125-duplicate-holdings-cleanup branch from 2cee8ac to 96097f3 Compare May 23, 2024 17:53
@aelkiss aelkiss marked this pull request as ready for review May 23, 2024 17:53
@aelkiss
Copy link
Member Author

aelkiss commented May 23, 2024

Rebased against main & added some fixes in the bin script for cleaning up duplicate holdings. Will merge after tests pass.

@aelkiss aelkiss merged commit 2bf7c3f into main May 23, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants