Skip to content
This repository has been archived by the owner on Dec 3, 2020. It is now read-only.

Consider only running extraction on allowlisted product pages #70

Closed
1 task
biancadanforth opened this issue Aug 27, 2018 · 2 comments
Closed
1 task

Comments

@biancadanforth
Copy link
Collaborator

biancadanforth commented Aug 27, 2018

Currently, outside of background updates, our extraction scripts run on every top level page. Since we are only interested in extracting product information from product pages on particular domains, we should avoid unnecessarily running the extraction scripts elsewhere. This could be achieved by only running the scripts if the domains are on an "allowlist".

We would also want to handle cases like "smile.amazon.com" and other alternate domains that serve the same content for the most part. The best thing I can think of now is to run extraction in all subdomains for our top five sites.

  • @javaun , I wanted to confirm with you that this allowlist approach is acceptable for the MVP?

This is related to #29 and #43 .

@javaun
Copy link

javaun commented Sep 4, 2018

Yes, this is a good idea to limit extraction to sites we support. If there is a slight performance penalty (and there could be ) then we're limiting it to sites where we could deliver value that overcomes that slight tax.

@biancadanforth
Copy link
Collaborator Author

This is a duplicate of #109 ; closing in favor of that issue.

biancadanforth added a commit that referenced this issue Oct 9, 2018
This PR depends on PR#143, so merge that one first.

This is a bit of a moot point for the MVP, since #70 restricts extraction to our five supported sites, none of which make use of Open Graph meta tags, however, that may change, and the fix was quick.

Note: To test this, comment out the other extraction methods in './src/extraction/index.js' and the allowlist check in that file's 'main' function, then visit a Crate and Barrel product page.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants