Redirect recipes during indexing when an earlier-known-origin-URL is discovered #84
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Describe the reason for these changes and the problem that they solve
As documented in #71, sometimes we discover an earlier known origin (first place that the page was found) for a recipe during recrawling/reindexing of recipes.
Recipes can be recrawled and reindexed in parallel, so there are few timing guarantees available about when this can and will happen.
However, indexing is the last opportunity we have to detect this situation, occurs for all recipes that appear in the search engine, and is a relatively cheap operation that may be re-run without any recrawling of content from the web.
This change adds a detection step that looks for earlier-known origins when a recipe is indexed. If an earlier origin is found, we (attempt to) redirect the recipe document to the expected document's ID, and hide the current document because otherwise it will appear as a duplicate.
Briefly summarize the changes
redirected_id
value on theRecipe
model to the expected ID of the earlier-origin document.How have the changes been tested?
List any issues that this change relates to
Relates to #71, #82.