Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redirect recipes during indexing when an earlier-known-origin-URL is discovered #84

Merged
merged 9 commits into from
Dec 12, 2023

Conversation

jayaddison
Copy link
Member

Describe the reason for these changes and the problem that they solve

As documented in #71, sometimes we discover an earlier known origin (first place that the page was found) for a recipe during recrawling/reindexing of recipes.

Recipes can be recrawled and reindexed in parallel, so there are few timing guarantees available about when this can and will happen.

However, indexing is the last opportunity we have to detect this situation, occurs for all recipes that appear in the search engine, and is a relatively cheap operation that may be re-run without any recrawling of content from the web.

This change adds a detection step that looks for earlier-known origins when a recipe is indexed. If an earlier origin is found, we (attempt to) redirect the recipe document to the expected document's ID, and hide the current document because otherwise it will appear as a duplicate.

Briefly summarize the changes

  1. When an earlier page origin is discovered for a recipe during indexing, set the redirected_id value on the Recipe model to the expected ID of the earlier-origin document.

How have the changes been tested?

  1. Pending testing.

List any issues that this change relates to
Relates to #71, #82.

@jayaddison jayaddison force-pushed the issue-71/index-redirections branch 3 times, most recently from 797d72e to 4aeb1c3 Compare December 12, 2023 16:12
@jayaddison jayaddison force-pushed the issue-71/index-redirections branch from 9cdaeff to 6bd3f90 Compare December 12, 2023 18:46
@jayaddison jayaddison merged commit 03e19f2 into main Dec 12, 2023
@jayaddison jayaddison deleted the issue-71/index-redirections branch December 12, 2023 18:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

1 participant