Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ht 2940 incremental update of overlaps #88

Merged
merged 4 commits into from
May 28, 2021

Conversation

jsteverman
Copy link
Contributor

bin/update_overlap_table.rb updates overlap records in the holdings_htitem_htmember table. It finds all clusters that have been modified in the last 36 hours (mostly arbitrary time frame intended for a daily cronjob), deletes overlap records for all included HT Items, and reinserts the appropriate overlap record.

I deleted a few of the file based bin scripts that are obsolete.

This does not address the actual creation of the cron job, which cannot be implemented until the table has been changed in production.

I am not sure what bin/export_overlap_report.rb is supposed to be doing.

Copy link
Member

@aelkiss aelkiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 This seems like it should work well. I especially like that we should be able to do more or less the same thing in a less batch-oriented environment -- if we imagine that at some point in the future we're getting changes to the hathifiles, holdings, etc as they come rather than in batches, the holdings update process could publish a message that says "cluster NNN updated", and then this process pretty much as-is could update its information about things in that cluster. (Of course, in such a future, we might be able to just provide an API to holdings that queries mongo directly, rather than needing to have it replicated in mysql... but that's for another day.)

Just to make sure we think this through -- this should handle merges as well as splits because the clusters will have update dates and so the update will filter down to all relevant items. It won't handle HTItems that disappear, but we already know (from the HTItem update process) that we are not handling that. If/when we revisit the question of handling deletes in the repository, this is yet another thing we'll need to watch out for.

@aelkiss aelkiss merged commit 33965fb into master May 28, 2021
@aelkiss aelkiss deleted the HT-2940_incremental_update_of_overlaps branch May 28, 2021 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants