This repository has been archived by the owner on Dec 3, 2020. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 15
Update fallback extraction CSS selector data #84
Milestone
Comments
biancadanforth
added a commit
that referenced
this issue
Sep 13, 2018
biancadanforth
added a commit
that referenced
this issue
Sep 14, 2018
biancadanforth
added a commit
that referenced
this issue
Sep 14, 2018
Improve fallback extraction by CSS selectors: * Update selectors for the top 5 sites. Add Home Depot and Best Buy. * Rename selectors JSON file to be more descriptive (was 'product_extraction_data.json', now 'fallback_extraction_selectors.json'). * Represent supported sites in 'fallback_extraction_selectors.json' as regular expression strings so that fallback extraction works for any subdomain of the site (e.g. 'smile.amazon.com'). * Represent CSS selectors by tuples in 'fallback_extraction_selectors.json', so that each selector can specify which attribute or property to read for that selector. * Clean price strings from fallback extraction using the same methods as used by Fathom extraction (PR #111); consolidate and move shared methods to 'utils.js'.
biancadanforth
added a commit
that referenced
this issue
Sep 21, 2018
Improve fallback extraction by CSS selectors: * Update selectors for the top 5 sites. Add Home Depot and Best Buy. * Rename selectors JSON file to be more descriptive (was 'product_extraction_data.json', now 'fallback_extraction_selectors.json'). * Represent supported sites in 'fallback_extraction_selectors.json' as regular expression strings so that fallback extraction works for any subdomain of the site (e.g. 'smile.amazon.com'). * Represent CSS selectors by tuples in 'fallback_extraction_selectors.json', so that each selector can specify which attribute or property to read for that selector. * Clean price strings from fallback extraction using the same methods as used by Fathom extraction (PR #111); consolidate and move shared methods to 'utils.js'.
biancadanforth
added a commit
that referenced
this issue
Sep 21, 2018
Improve fallback extraction by CSS selectors: * Update selectors for the top 5 sites. Add Home Depot and Best Buy. * Rename selectors JSON file to be more descriptive (was 'product_extraction_data.json', now 'fallback_extraction_selectors.json'). * Represent supported sites in 'fallback_extraction_selectors.json' as regular expression strings so that fallback extraction works for any subdomain of the site (e.g. 'smile.amazon.com'). * Represent CSS selectors by tuples in 'fallback_extraction_selectors.json', so that each selector can specify which attribute or property to read for that selector. * Clean price strings from fallback extraction using the same methods as used by Fathom extraction (PR #111); consolidate and move shared methods to 'utils.js'.
biancadanforth
added a commit
that referenced
this issue
Sep 28, 2018
Improve fallback extraction by CSS selectors: * Update selectors for the top 5 sites. Add Home Depot and Best Buy. * Rename selectors JSON file to be more descriptive (was 'product_extraction_data.json', now 'fallback_extraction_selectors.json'). * Represent supported sites in 'fallback_extraction_selectors.json' as regular expression strings so that fallback extraction works for any subdomain of the site (e.g. 'smile.amazon.com'). * Represent CSS selectors by tuples in 'fallback_extraction_selectors.json', so that each selector can specify which attribute or property to read for that selector. * Clean price strings from fallback extraction using the same methods as used by Fathom extraction (PR #111); consolidate and move shared methods to 'utils.js'.
biancadanforth
added a commit
that referenced
this issue
Sep 28, 2018
Fix #84: Improve fallback extraction
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Our
product_extraction_data.json
is outdated. We should update this file so that it has CSS selectors for all five of our top sites (Home Depot and Best Buy are currently missing) at a minimum in the case that Fathom extraction fails.The text was updated successfully, but these errors were encountered: