Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PP-1358] add new events #2174

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

[PP-1358] add new events #2174

wants to merge 10 commits into from

Conversation

dbernstein
Copy link
Contributor

@dbernstein dbernstein commented Nov 19, 2024

Description

This update captures five new circulation events:

circulation_manager_hold_expired
circulation_manager_hold_ready
circulation_manager_hold_converted_to_loan
circulation_manager_loan_expired
circulation_manager_loan_converted_to_hold

I have a couple of questions/comments to inform the review:

  1. We are capturing hold expirations where the CM is and is not the source of truth. However for loans I believe we are only handling expired loans where the CM is not source of truth. Did I miss something here? Or should we be doing something for loan expirations where CM is not the source of truth?

  2. When we convert holds to loans I am both capturing the conversion event as well as recording a "checkout". Because it is a checkout event that happens to be a conversion. I think that's what we want but others may disagree.

  3. I put a note in the code, but I'll call it out again here: there appears to be a case where a loan can be converted to a hold. I'm not sure what the scenario is, but I've captured it is a distinct event in addition to the placing of the hold.

  4. I ran into some mypy issues with the collect_event method. That led me to observe that all loans and holds have patrons and the intention seems to be that all patrons must have loans and holds but the database does not enforce that constraint. I included the migration here, but I am happy to move it to a separate PR is that seems cleaner.

Motivation and Context

https://ebce-lyrasis.atlassian.net/browse/PP-1804

How Has This Been Tested?

New unit tests added. Older unit tests amended.

Checklist

  • I have updated the documentation accordingly.
  • All new and existing tests passed.

@dbernstein dbernstein added the feature New feature label Nov 19, 2024
Copy link

codecov bot commented Nov 19, 2024

Codecov Report

Attention: Patch coverage is 90.90909% with 3 lines in your changes missing coverage. Please review.

Project coverage is 90.87%. Comparing base (5a49b70) to head (a1700c8).
Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
src/palace/manager/core/monitor.py 66.66% 1 Missing and 1 partial ⚠️
src/palace/manager/api/circulation.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2174   +/-   ##
=======================================
  Coverage   90.87%   90.87%           
=======================================
  Files         359      359           
  Lines       41263    41288   +25     
  Branches     8883     8884    +1     
=======================================
+ Hits        37498    37522   +24     
- Misses       2476     2477    +1     
  Partials     1289     1289           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

@dbernstein dbernstein requested a review from a team November 19, 2024 18:28
@dbernstein dbernstein added the DB migration This PR contains a DB migration label Nov 19, 2024
Copy link
Member

@jonathangreen jonathangreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit concerned about the performance impacts of this change. I'd like to understand that a bit better before this one goes in.

def remove_expired_holds_for_collection(db: Session, collection_id: int) -> int:
def remove_expired_holds_for_collection(
db: Session, collection_id: int, analytics: Analytics
) -> int:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a sense of how much slower this function is now? I spent some time optimizing it, so previously it was very fast, since it was just running a single delete query on fairly well indexed data. Now we are doing a select, iterating through each row, and pushing a file to s3 for each row. Seems like that has the potential to be a lot slower.

I ask, because I've been trying to keep the run time of any one celery task under two minutes, so eventually we can run these tasks on AWS spot instances, and everything becomes a lot easier if we are able to complete a task within the shutdown window.

This function gets called within a single task for every collection on a CM, see my note below.

Hold.end < utc_now(),
Hold.license_pool_id == LicensePool.id,
LicensePool.collection_id == collection_id,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is done in two steps, what we log in analytics has the possibility of getting out of sync with the DB right? I'd suggest making this into a select for update query, then doing the delete in the for loop where we also log the analytics events.

@@ -123,7 +153,9 @@ def remove_expired_holds(task: Task) -> None:
]
for collection_id, collection_name in collections:
with task.transaction() as session:
removed = remove_expired_holds_for_collection(session, collection_id)
removed = remove_expired_holds_for_collection(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is where the function I noted above gets called. So within a single task it iterates over all the collections in a CM. Since the function completes quickly now its not a concern, but I'm a bit worried with these changes it might be slow enough that we need a different approach here.

Can you do some testing to get an idea with a CM like CA or CT how long the updated function will take if we have a lot of holds outstanding? If it still completes reasonably fast then these changes are fine, but if it is now taking more then 30 - 45 seconds I think we should consider either re-queuing the task after a certain amount of work or only processing one collection per task or something like that to speed it up.

@@ -200,8 +232,12 @@ def recalculate_hold_queue_collection(
f"Skipping license pool {license_pool_id} because it no longer exists."
)
continue

analytics = task.services.analytics.analytics()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: No need to create this service on every loop. this can be moved outside of the for loop.

@@ -880,11 +880,18 @@ def run_once(self, *args, **kwargs):
count = qu.count()
self.log.info("Deleting %d row(s)", count)
while count > 0:
post_delete_ops = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this being used for anything? I see a for loop below that references it, but I don't see anywhere that data gets added to it.

@@ -897,6 +904,9 @@ def delete(self, row):
"""
self._db.delete(row)

def post_delete(self, row) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Why add a new function for this? Can't LoanlikeReaperMonitor just override delete, call the super method, then do whatever it needs to do?

src/palace/manager/sqlalchemy/model/patron.py Show resolved Hide resolved
assert True == is_new
assert CirculationEvent.NEW_PATRON == analytics.event_type
assert 1 == analytics.count
assert is_new == True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: If we are updating these we should update boolean checks to be strict as well:

assert is_new is True

assert "Achewood" == patron.neighborhood
assert 1 == analytics.count
assert patron.authorization_identifier == "2"
assert is_new == False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Same as above

assert is_new is False

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DB migration This PR contains a DB migration feature New feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants