You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Dec 3, 2020. It is now read-only.
Currently, outside of background updates, our extraction scripts run on every top level page. Since we are only interested in extracting product information from product pages on particular domains, we should avoid unnecessarily running the extraction scripts elsewhere. This could be achieved by only running the scripts if the domains are on an "allowlist".
We would also want to handle cases like "smile.amazon.com" and other alternate domains that serve the same content for the most part. The best thing I can think of now is to run extraction in all subdomains for our top five sites.
@javaun , I wanted to confirm with you that this allowlist approach is acceptable for the MVP?
Yes, this is a good idea to limit extraction to sites we support. If there is a slight performance penalty (and there could be ) then we're limiting it to sites where we could deliver value that overcomes that slight tax.
This PR depends on PR#143, so merge that one first.
This is a bit of a moot point for the MVP, since #70 restricts extraction to our five supported sites, none of which make use of Open Graph meta tags, however, that may change, and the fix was quick.
Note: To test this, comment out the other extraction methods in './src/extraction/index.js' and the allowlist check in that file's 'main' function, then visit a Crate and Barrel product page.
Currently, outside of background updates, our extraction scripts run on every top level page. Since we are only interested in extracting product information from product pages on particular domains, we should avoid unnecessarily running the extraction scripts elsewhere. This could be achieved by only running the scripts if the domains are on an "allowlist".
We would also want to handle cases like "smile.amazon.com" and other alternate domains that serve the same content for the most part. The best thing I can think of now is to run extraction in all subdomains for our top five sites.
This is related to #29 and #43 .
The text was updated successfully, but these errors were encountered: