Skip to content
This repository has been archived by the owner on Dec 3, 2020. It is now read-only.

92% price accuracy #275

Merged
merged 42 commits into from
Nov 20, 2018
Merged
Show file tree
Hide file tree
Changes from 40 commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
2558ea5
Get trainer running on image out-rule from webext-commerce.
erikrose Oct 5, 2018
ca8d4c8
Add a trainee for each out() rule so we can choose them from the Trai…
erikrose Oct 19, 2018
ce27b0d
Respell a regex for clarity.
erikrose Oct 25, 2018
d8e6f43
Bring up to date with 9813ba8b59e6125b9ab18f51499e47bb2ec55745 in htt…
erikrose Oct 25, 2018
46b61d9
Rewrite isAboveTheFold using trapezoid() and fuzzy-logic scores.
erikrose Oct 25, 2018
2eb12b8
Refactor largerImage as well. This completes the image coefficients. …
erikrose Oct 25, 2018
51e6b30
Rewrite image y-axis scorer to constrain to 0..1 and for simplicity.
erikrose Oct 26, 2018
6909520
Retrain to fix the priceish coeff for isAboutTheFold().
erikrose Oct 29, 2018
8d38da1
Change rules that look for "price" in IDs and classes to emit fuzzy c…
erikrose Oct 29, 2018
525bd54
Re-express font-size rule as a confidence.
erikrose Oct 29, 2018
b567dd9
Rewrite rule that give a bonus to prices near the winning image.
erikrose Oct 29, 2018
90423da
Express hasPriceishPattern as a fuzzy truth.
erikrose Oct 29, 2018
74658e9
Fix the bugs that immediately kept the trainer from training.
erikrose Oct 29, 2018
c52af18
Remove a now-unused constant and an out-of-date comment.
erikrose Oct 30, 2018
737ca67
Add new coeffs to get to 100% on the training set!
erikrose Oct 30, 2018
47bbc7a
Rename hasPriceIn, since it doesn't actually have "price" hard-coded …
erikrose Nov 14, 2018
307fa3d
Consider divs with background images as well as img tags.
erikrose Nov 14, 2018
e1f4479
Typos
erikrose Nov 14, 2018
f0eba0d
There's no need to say "node". All scoring functions take nodes.
erikrose Nov 14, 2018
3fc5856
Add a rule to punish extreme aspect ratios and another to punish back…
erikrose Nov 14, 2018
dd16bf0
Hard-code a height for aboveTheFold so a user's different window size…
erikrose Nov 14, 2018
418a9cf
Make the image trainee train only the image-affecting coeffs, for speed.
erikrose Nov 14, 2018
3486879
Move tuned image coeffs into master vector.
erikrose Nov 15, 2018
9217d83
Make a more efficient training vector for price.
erikrose Nov 15, 2018
89da723
Improve price coeffs: 100% on 12-16.
erikrose Nov 15, 2018
0c888cb
Improve price coeffs. 93.3% on 12-16, 1-10. 92% on 1-25.
erikrose Nov 15, 2018
1735fe3
Improve price coeffs: 98.7% on all current training samples (1-25 and…
erikrose Nov 15, 2018
82709cf
Copy tuned price coeffs to master vector.
erikrose Nov 16, 2018
ac1d304
Put the glue code back how I found it, and move the coeffs back into …
erikrose Nov 16, 2018
ba89ed4
Fix a mistranscribed coefficient.
erikrose Nov 16, 2018
fd6b043
Make linter happy.
erikrose Nov 16, 2018
6ce3ea9
Rename trapezoid() to linearScale().
erikrose Nov 20, 2018
5d55d10
Use single-line doclets where possible. Put a newline after double as…
erikrose Nov 20, 2018
45db392
Rename contains() functions to indicate what returns bools and what r…
erikrose Nov 20, 2018
e9edf39
Un-inline the aspect ratio rule.
erikrose Nov 20, 2018
c0570d3
Un-inline hasBackgroundInID().
erikrose Nov 20, 2018
a57656a
Stick types in local names.
erikrose Nov 20, 2018
4cca17a
Remove unneeded !!.
erikrose Nov 20, 2018
ea275aa
Teach the application bits how to extract from CSS background-images.
erikrose Nov 20, 2018
57ca295
Damn you, linter.
erikrose Nov 20, 2018
aa4bd88
Use slice() for brevity and great justice.
erikrose Nov 20, 2018
3c6649a
Merge branch 'master' into 90%-price
Osmose Nov 20, 2018
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@
"dependencies": {
"autobind-decorator": "2.1.0",
"dinero.js": "1.4.0",
"fathom-web": "2.3.0",
"fathom-web": "2.8.0",
"lodash.maxby": "4.6.0",
"lodash.minby": "4.6.0",
"lodash.orderby": "^4.6.0",
Expand Down
22 changes: 13 additions & 9 deletions src/extraction/fathom/coefficients.json
Original file line number Diff line number Diff line change
@@ -1,12 +1,16 @@
{
"hasDollarSignCoeff": 8,
"hasPriceInClassNameCoeff": 2,
"hasPriceInIDCoeff": 17,
"hasPriceishPatternCoeff": 15,
"isAboveTheFoldImageCoeff": 13,
"isAboveTheFoldPriceCoeff": 33,
"isNearbyImageXAxisPriceCoeff": 5,
"backgroundIdImageCoeff": 4,
"bigFontCoeff": 14,
"bigImageCoeff": 9,
"extremeAspectCoeff": 3,
"hasDollarSignCoeff": 3,
"hasPriceInClassNameCoeff": 7,
"hasPriceInIDCoeff": 8,
"hasPriceInParentClassNameCoeff": -1,
"hasPriceInParentIDCoeff": -2,
"hasPriceishPatternCoeff": 4,
"isAboveTheFoldImageCoeff": 5,
"isAboveTheFoldPriceCoeff": -19,
"isNearbyImageYAxisTitleCoeff": 5,
"largerFontSizeCoeff": 7,
"largerImageCoeff": 2
"isNearImageCoeff": 4
}
16 changes: 15 additions & 1 deletion src/extraction/fathom/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,22 @@ const FEATURE_DEFAULTS = {
const PRODUCT_FEATURES = {
image: {
...FEATURE_DEFAULTS,
/**
* @return the URL of an image resource
*/
getValueFromElement(element) {
return element.src;
/**
* Given a CSS url() declaration 'url("http://foo")', return 'http://foo'.
*/
function urlFromCssDeclaration(declaration) {
return declaration.substring(5, declaration.length - 2);
erikrose marked this conversation as resolved.
Show resolved Hide resolved
}
if (element.tagName === 'IMG') {
return element.src;
}
// The other thing the ruleset can return is an arbitrary element with
// a CSS background image.
return urlFromCssDeclaration(getComputedStyle(element)['background-image']);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit of a nit, but I've always found that function declarations at the top break the flow of the function and make it harder to read. The function name helps the readability, but a variable can do just as well without interrupting the order:

const backgroundImage = getComputedStyle(element)['background-image'];
return backgroundImage.substring(5, background.length - 2);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted this to be ScreechinglyObviousCode. With magic numbers like 5 and -2, it's not otherwise screechingly obvious that what we're trying to do is pull the param out of url("…").

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An extra variable name, then?

const backgroundImage = getComputedStyle(element)['background-image'];
const backgroundImageUrl = backgroundImage.substring(5, background.length - 2); // "url('<image_url>')"
return backgroundImageUrl;

(Or not. This is very much yak shaving at this point.)

},
},
title: {
Expand Down
Loading