Skip to content
This repository has been archived by the owner on Dec 3, 2020. It is now read-only.

Improve price extraction flexibility #42

Closed
Osmose opened this issue Aug 10, 2018 · 0 comments
Closed

Improve price extraction flexibility #42

Osmose opened this issue Aug 10, 2018 · 0 comments
Assignees
Milestone

Comments

@Osmose
Copy link
Contributor

Osmose commented Aug 10, 2018

Edit [bdanforth]: See also #79 (duplicate) for more information; particularly this comment.

Prices are currently extracted from pages by identifying the DOM node containing the price and parsing its innerText value as a string. However, some sites (notably, Home Depot) store prices in a way that doesn't produce an innerText value shaped the way we need it to be.

First, we should define the acceptance criteria for price parsing. Then, we should improve our existing parsing to match those criteria.

@Osmose Osmose added this to the November MVP milestone Aug 10, 2018
@biancadanforth biancadanforth self-assigned this Sep 11, 2018
biancadanforth added a commit that referenced this issue Sep 11, 2018
biancadanforth added a commit that referenced this issue Sep 12, 2018
* Update how we pull the price string from extracted Fathom price elements to provide main and subunits (e.g. dollars and cents) if available.
* Added price string cleaning methods to remove extra characters (like commas) that were causing price parsing to fail.
* Handle case when price string parsing still fails after cleaning by checking in the background script that the price string is formatted correctly before rendering the browserAction popup.
  * This will guarantee we never see the “blank panel” reported in #79/#88.

Price element innerText strings now supported as a result of these changes:
* "$1327 /each" ([Home Depot example page](https://www.homedepot.com/p/KitchenAid-Classic-4-5-Qt-Tilt-Head-White-Stand-Mixer-K45SSWH/202546032))
* "$1,049.00" ([Amazon example page](https://www.amazon.com/Fujifilm-X-T2-Mirrorless-F2-8-4-0-Lens/dp/B01I3LNQ6M/ref=sr_1_2?ie=UTF8&qid=1535594119&sr=8-2&keywords=fuji+xt2+camera))
* "US $789.99" ([Ebay example page](https://www.ebay.com/itm/Dell-Inspiron-7570-15-6-Touch-Laptop-i7-8550U-1-8GHz-8GB-1TB-NVIDIA-940MX-W10/263827294291))
* "$4.99+" ([Etsy example page](https://www.etsy.com/listing/555504975/frankenstein-2-custom-stencil?ga_order=most_relevant&ga_search_type=all&ga_view_type=gallery&ga_search_query=&ref=sr_gallery-1-13))

Note: This does not handle the case where there is more than one price for the product page (e.g. if we see a range of prices such as "$19.92 - $38.00" or if the price changes based on size/color, etc.); that’s handled by Issue #86.
biancadanforth added a commit that referenced this issue Sep 12, 2018
* Update how we pull the price string from extracted Fathom price elements to provide main and subunits (e.g. dollars and cents) if available.
* Add price string cleaning methods to remove extra characters (like commas) that were causing price parsing to fail.
* Handle case when price string parsing still fails after cleaning by checking in the background script that the price string is formatted correctly before rendering the browserAction popup.
  * This will guarantee we never see the “blank panel” reported in #79 and #88.

Price element innerText strings now supported as a result of these changes:
* "$1327 /each" ([Home Depot example page](https://www.homedepot.com/p/KitchenAid-Classic-4-5-Qt-Tilt-Head-White-Stand-Mixer-K45SSWH/202546032))
* "$1,049.00" ([Amazon example page](https://www.amazon.com/Fujifilm-X-T2-Mirrorless-F2-8-4-0-Lens/dp/B01I3LNQ6M/ref=sr_1_2?ie=UTF8&qid=1535594119&sr=8-2&keywords=fuji+xt2+camera))
* "US $789.99" ([Ebay example page](https://www.ebay.com/itm/Dell-Inspiron-7570-15-6-Touch-Laptop-i7-8550U-1-8GHz-8GB-1TB-NVIDIA-940MX-W10/263827294291))
* "$4.99+" ([Etsy example page](https://www.etsy.com/listing/555504975/frankenstein-2-custom-stencil?ga_order=most_relevant&ga_search_type=all&ga_view_type=gallery&ga_search_query=&ref=sr_gallery-1-13))

Note: This does not handle the case where there is more than one price for the product page (e.g. if we see a range of prices such as "$19.92 - $38.00" or if the price changes based on size/color, etc.); that’s handled by Issue #86.
biancadanforth added a commit that referenced this issue Sep 18, 2018
* Update how we pull the price string from extracted Fathom price elements to provide main and subunits (e.g. dollars and cents) if available.
* Add price string cleaning methods to remove extra characters (like commas) that were causing price parsing to fail.
* Handle case when price string parsing still fails after cleaning by checking in the background script that the price string is formatted correctly before rendering the browserAction popup.
  * This will guarantee we never see the “blank panel” reported in #79 and #88.

Price element innerText strings now supported as a result of these changes:
* "$1327 /each" ([Home Depot example page](https://www.homedepot.com/p/KitchenAid-Classic-4-5-Qt-Tilt-Head-White-Stand-Mixer-K45SSWH/202546032))
* "$1,049.00" ([Amazon example page](https://www.amazon.com/Fujifilm-X-T2-Mirrorless-F2-8-4-0-Lens/dp/B01I3LNQ6M/ref=sr_1_2?ie=UTF8&qid=1535594119&sr=8-2&keywords=fuji+xt2+camera))
* "US $789.99" ([Ebay example page](https://www.ebay.com/itm/Dell-Inspiron-7570-15-6-Touch-Laptop-i7-8550U-1-8GHz-8GB-1TB-NVIDIA-940MX-W10/263827294291))
* "$4.99+" ([Etsy example page](https://www.etsy.com/listing/555504975/frankenstein-2-custom-stencil?ga_order=most_relevant&ga_search_type=all&ga_view_type=gallery&ga_search_query=&ref=sr_gallery-1-13))

Note: This does not handle the case where there is more than one price for the product page (e.g. if we see a range of prices such as "$19.92 - $38.00" or if the price changes based on size/color, etc.); that’s handled by Issue #86.
biancadanforth added a commit that referenced this issue Sep 20, 2018
Fix #42: Improve price string parsing.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants