Consider only running extraction on allowlisted product pages #70

biancadanforth · 2018-08-27T20:52:25Z

Currently, outside of background updates, our extraction scripts run on every top level page. Since we are only interested in extracting product information from product pages on particular domains, we should avoid unnecessarily running the extraction scripts elsewhere. This could be achieved by only running the scripts if the domains are on an "allowlist".

We would also want to handle cases like "smile.amazon.com" and other alternate domains that serve the same content for the most part. The best thing I can think of now is to run extraction in all subdomains for our top five sites.

@javaun , I wanted to confirm with you that this allowlist approach is acceptable for the MVP?

This is related to #29 and #43 .

javaun · 2018-09-04T19:30:55Z

Yes, this is a good idea to limit extraction to sites we support. If there is a slight performance penalty (and there could be ) then we're limiting it to sites where we could deliver value that overcomes that slight tax.

biancadanforth · 2018-09-12T18:00:17Z

This is a duplicate of #109 ; closing in favor of that issue.

This PR depends on PR#143, so merge that one first. This is a bit of a moot point for the MVP, since #70 restricts extraction to our five supported sites, none of which make use of Open Graph meta tags, however, that may change, and the fix was quick. Note: To test this, comment out the other extraction methods in './src/extraction/index.js' and the allowlist check in that file's 'main' function, then visit a Crate and Barrel product page.

biancadanforth closed this as completed Sep 12, 2018

biancadanforth mentioned this issue Oct 9, 2018

Fix #154: Parse Open Graph price string to a number #155

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider only running extraction on allowlisted product pages #70

Consider only running extraction on allowlisted product pages #70

biancadanforth commented Aug 27, 2018 •

edited

Loading

javaun commented Sep 4, 2018 •

edited

Loading

biancadanforth commented Sep 12, 2018

Consider only running extraction on allowlisted product pages #70

Consider only running extraction on allowlisted product pages #70

Comments

biancadanforth commented Aug 27, 2018 • edited Loading

javaun commented Sep 4, 2018 • edited Loading

biancadanforth commented Sep 12, 2018

biancadanforth commented Aug 27, 2018 •

edited

Loading

javaun commented Sep 4, 2018 •

edited

Loading