This repository has been archived by the owner on Dec 3, 2020. It is now read-only.
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
#36: Integrate Fathom-based page extraction with a simple ruleset.
This patch will run Fathom against the page (not distinguishing a product from a non-product page) and log the extracted price value and page URL to the console via 'background.js'. Failing that, it will fall back to extraction via CSS selectors if any exist for the site in 'product_extraction_data.json', and failing that, it will try extraction via Open Graph meta tags. This is heavily based on [Swathi Iyer](https://github.com/swathiiyer2/fathom-products) and [Victor Ng’s](https://github.com/mozilla/fathom-webextension) prior work. Currently, there is only one ruleset with one naive rule for one product feature, price. This initial commit is intended cover Fathom integration into the web extension. A later commit will add rules and take training data into account. Note: The 'runRuleset' method in 'productInfo.js' returns 'NaN' if it doesn't find any elements for any of its rules. Performance observations: Originally, I had dumped Swathi's three rulesets (one each for product title, image and price) and tried to run them against any page, similar to Victor Ng's web extension. However, that was [freezing up the tab](#36 (comment)), and after profiling the content script Fathom was running in before and after replacing Swathi's rulesets with a single ruleset with only one rule for one attribute, I did not see any warnings from Firefox, nor detect any significant performance hits in the DevTools profiler due to Fathom. It would therefore appear the performance hit was related to the complex rulesets and not Fathom itself. Webpack observations: While [`jsdom`](https://www.npmjs.com/package/jsdom) is a `fathom-web` dependency, it is used only for running `fathom-web` in the Node context for testing. To avoid build errors associated with `jsdom` and its dependencies, I added a `’null-loader’` for that `require` call, which mocks the module as an empty object. This loader is also used in webpack.config.test.js, from PR #32.
- Loading branch information