-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HT-2835: Daily Hathifile loading #78
Conversation
- Always revalidate connection - Ensure if a test fails that any injections are reset
b54b270
to
ff45f17
Compare
- ensure MONGOID_ENV is set in specs - ensure specs use spec_helper to load settings - ensure environment gets set in a usable way by default - remove environment variables from places other than config - makes getting config for Sequel database match keycard/checkpoint
ff45f17
to
3ab6f06
Compare
Given the changes for config here we will need to adjust our Kubernetes cronjob / pod configuration to include appropriate configuration. I can work on that, including adding config for a cron job for this, once we think this is headed in the right direction. |
lib/holdings_file.rb
Outdated
unrestrict_primary_key | ||
|
||
def self.latest(**kwargs) | ||
where(**kwargs).order_by(Sequel.desc(:produced)).limit(1).first |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The splat here and below on line 16 are unnecessary, I think
@@ -1,4 +1,4 @@ | |||
GRANT USAGE ON *.* TO 'ht_repository'@'%' IDENTIFIED BY 'ht_repository'; | |||
GRANT SELECT ON `ht_rights`.* TO 'ht_repository'@'%'; | |||
GRANT SELECT, INSERT, UPDATE, DELETE, LOCK TABLES ON `ht_repository`.* TO 'ht_repository'@'%'; | |||
GRANT SELECT, INSERT, UPDATE, DELETE, LOCK TABLES ON `ht`.* TO 'ht_repository'@'%'; | |||
GRANT ALL ON `ht_repository`.* TO 'ht_repository'@'%'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably isn't necessary -- I had changed this because at one point it was doing a TRUNCATE
in the tests, which needed drop table privilege, but now it's doing it with transactions.
# Responsible for locating and loading a single Hathifile, a file containing | ||
# HtItems represented as tab-separated values. | ||
# | ||
# Usage: Hathifile.new(date).load |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add expected date format to this documentation.
**kwargs) | ||
# Connection connects to the database using the connection information | ||
# specified by Settings.database, which should contain keyword parameters | ||
# matching those taken by Sequel.connect. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not accurate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the Ettin/Settings stuff.
Haven't seen a lot of the rspec lingo you are using.
FileMutex currently allows any path as lockfile. I think we should specify a list of allowed paths, perhaps in a specified dir, just to make sure we don't proliferate lockfiles all over the file system.
No reason not to approve.
@@ -0,0 +1,7 @@ | |||
# frozen_string_literal: true | |||
|
|||
require "ettin" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a real objection, but it does not appear there is a current maintainer of Ettin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. The canonical homes are http://github.com/mlibrary/ettin and http://github.com/mlibrary/canister and we can certainly update it as needed. @botimer will also reach out to @malakai97 and make sure that we (as MLibrary) can publish new versions on rubygems as needed.
Is there a specific rspec thing you're seeing that isn't clear? I'm certainly open to changing it especially in cases where what a test is testing or where its data is coming from isn't obvious |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we expect to use FileMutex and LoadedFile for Holdings/OCNResolutions as well?
This all looks reasonable to me. HoldingsDB is an ugliness I would like to deal with at some point, but it's not new and is not a pressing matter.
Yes, I'm expecting we'll want to use these for the other kinds of data as well, although it's not 100% clear to me when we need to use the |
This orchestrates loading hathifiles on a daily basis. In particular, it adds:
holdings_files
, to track what files we have loaded on a daily basis or notFileMutex
class to ensure processes aren't trying to run reports while we're loading dataIt also adds the notion of top-level configuration separately from
Services
, which should help some to reduce the number of environment variables and generally make configuration more visible.TODO before this is fully ready for review:
holdings_files
toholdings_loaded_files
; class should be namedLoadedFiles