Skip to content
This repository has been archived by the owner on Dec 3, 2020. It is now read-only.

Commit

Permalink
#36: Update rules for better accuracy
Browse files Browse the repository at this point in the history
These rules and coefficients yield the following accuracy based on a training corpus of 50 product pages from our top 5 sites (Amazon, Ebay, Walmart, Best Buy and Home Depot):
* 100% for product 'image'
* 96% for product 'title'
* 94% for product 'price'
Product 'price' and 'title' features have proximity rules based on the highest scoring product 'image' element. For now, this is done by accessing the image fnode using an internal '_ruleset' object; @erikrose is working on better support for this use case in the very near future, so this implementation can be improved at that time.
  • Loading branch information
biancadanforth committed Aug 12, 2018
1 parent 25836b7 commit 5b0aba8
Show file tree
Hide file tree
Showing 4 changed files with 227 additions and 78 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,4 @@ node_modules
web-ext-artifacts
build
gecko.log
.DS_Store
16 changes: 9 additions & 7 deletions src/fathom_coefficients.json
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
{
"largerImageCoeff": 3,
"largerFontSizeCoeff": 1,
"hasDollarSignCoeff": 3,
"hasTitleInIDCoeff": 10,
"hasTitleInClassNameCoeff": 5,
"isHiddenCoeff": -100,
"isHeaderElementCoeff": 10
"largerImageCoeff": 2,
"largerFontSizeCoeff": 7,
"hasDollarSignCoeff": 8,
"hasPriceInIDCoeff": 17,
"hasPriceInClassNameCoeff": 2,
"isAboveTheFoldPriceCoeff": 33,
"isAboveTheFoldImageCoeff": 13,
"isNearbyImageXAxisCoeff": 5,
"hasPriceishPatternCoeff": 15
}
20 changes: 12 additions & 8 deletions src/fathom_extraction.js
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,12 @@ import {
largerImageCoeff,
largerFontSizeCoeff,
hasDollarSignCoeff,
hasTitleInIDCoeff,
hasTitleInClassNameCoeff,
isHiddenCoeff,
isHeaderElementCoeff,
hasPriceInIDCoeff,
hasPriceInClassNameCoeff,
isAboveTheFoldPriceCoeff,
isAboveTheFoldImageCoeff,
isNearbyImageXAxisCoeff,
hasPriceishPatternCoeff,
} from 'commerce/fathom_coefficients.json';

const PRODUCT_FEATURES = ['title', 'price', 'image'];
Expand All @@ -36,10 +38,12 @@ function runRuleset(doc) {
largerImageCoeff,
largerFontSizeCoeff,
hasDollarSignCoeff,
hasTitleInIDCoeff,
hasTitleInClassNameCoeff,
isHiddenCoeff,
isHeaderElementCoeff,
hasPriceInIDCoeff,
hasPriceInClassNameCoeff,
isAboveTheFoldPriceCoeff,
isAboveTheFoldImageCoeff,
isNearbyImageXAxisCoeff,
hasPriceishPatternCoeff,
]).against(doc).get(`${feature}`);
fnodesList = fnodesList.filter(fnode => fnode.scoreFor(`${feature}ish`) >= SCORE_THRESHOLD);
// It is possible for multiple elements to have the same highest score.
Expand Down
Loading

0 comments on commit 5b0aba8

Please sign in to comment.