Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update HiDive scraping #193

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Manitary
Copy link
Contributor

@Manitary Manitary commented Apr 10, 2024

The HiDive website change (in March I think?) broke the scraper:

  • The URL changed path from /tv/show-key to /season/show-key (and changed keys to a numeric id).
  • The anime page is not served directly, but contains javascript code that performs requests to obtain the real page content.

Currently there is no "real" issue since:

  • The script does not crash because the old URL still gets a page, just without any anime, so it gets ignored.
  • HiDive is "simul-ripped", so everything should show up timely as torrent anyway.

These changes allow to scrape releases directly from HiDive by replicating only the requests necessary to access the page contents as JSON (see the function _load_page_data); other functions and methods are updated accordingly, as well as the various regexes.

I may rewrite some parts (e.g. episode validation) eventually to be less messy

Note:

  • It may be required to re-run the edit/update commands for the currently airing HiDive shows. Double check the show_key value in the Streams table for those shows, to make sure there aren't malformed entries from the current script edit/update runs.
  • I'll continue monitoring the updated script throughout the season to make sure everything works, it seems fine for what is out so far. "Unhappy paths" only got minimal testing, but there should be enough error catching to prevent crashing the script.
  • I added a date check like in Add date check for HIDIVE #171 but this can be foregoed entirely (I think you can just scrap the entire loop with the validation?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant