This repository has been archived by the owner on Dec 3, 2020. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 15
Improve price extraction flexibility #42
Milestone
Comments
biancadanforth
added a commit
that referenced
this issue
Sep 11, 2018
biancadanforth
added a commit
that referenced
this issue
Sep 12, 2018
* Update how we pull the price string from extracted Fathom price elements to provide main and subunits (e.g. dollars and cents) if available. * Added price string cleaning methods to remove extra characters (like commas) that were causing price parsing to fail. * Handle case when price string parsing still fails after cleaning by checking in the background script that the price string is formatted correctly before rendering the browserAction popup. * This will guarantee we never see the “blank panel” reported in #79/#88. Price element innerText strings now supported as a result of these changes: * "$1327 /each" ([Home Depot example page](https://www.homedepot.com/p/KitchenAid-Classic-4-5-Qt-Tilt-Head-White-Stand-Mixer-K45SSWH/202546032)) * "$1,049.00" ([Amazon example page](https://www.amazon.com/Fujifilm-X-T2-Mirrorless-F2-8-4-0-Lens/dp/B01I3LNQ6M/ref=sr_1_2?ie=UTF8&qid=1535594119&sr=8-2&keywords=fuji+xt2+camera)) * "US $789.99" ([Ebay example page](https://www.ebay.com/itm/Dell-Inspiron-7570-15-6-Touch-Laptop-i7-8550U-1-8GHz-8GB-1TB-NVIDIA-940MX-W10/263827294291)) * "$4.99+" ([Etsy example page](https://www.etsy.com/listing/555504975/frankenstein-2-custom-stencil?ga_order=most_relevant&ga_search_type=all&ga_view_type=gallery&ga_search_query=&ref=sr_gallery-1-13)) Note: This does not handle the case where there is more than one price for the product page (e.g. if we see a range of prices such as "$19.92 - $38.00" or if the price changes based on size/color, etc.); that’s handled by Issue #86.
biancadanforth
added a commit
that referenced
this issue
Sep 12, 2018
* Update how we pull the price string from extracted Fathom price elements to provide main and subunits (e.g. dollars and cents) if available. * Add price string cleaning methods to remove extra characters (like commas) that were causing price parsing to fail. * Handle case when price string parsing still fails after cleaning by checking in the background script that the price string is formatted correctly before rendering the browserAction popup. * This will guarantee we never see the “blank panel” reported in #79 and #88. Price element innerText strings now supported as a result of these changes: * "$1327 /each" ([Home Depot example page](https://www.homedepot.com/p/KitchenAid-Classic-4-5-Qt-Tilt-Head-White-Stand-Mixer-K45SSWH/202546032)) * "$1,049.00" ([Amazon example page](https://www.amazon.com/Fujifilm-X-T2-Mirrorless-F2-8-4-0-Lens/dp/B01I3LNQ6M/ref=sr_1_2?ie=UTF8&qid=1535594119&sr=8-2&keywords=fuji+xt2+camera)) * "US $789.99" ([Ebay example page](https://www.ebay.com/itm/Dell-Inspiron-7570-15-6-Touch-Laptop-i7-8550U-1-8GHz-8GB-1TB-NVIDIA-940MX-W10/263827294291)) * "$4.99+" ([Etsy example page](https://www.etsy.com/listing/555504975/frankenstein-2-custom-stencil?ga_order=most_relevant&ga_search_type=all&ga_view_type=gallery&ga_search_query=&ref=sr_gallery-1-13)) Note: This does not handle the case where there is more than one price for the product page (e.g. if we see a range of prices such as "$19.92 - $38.00" or if the price changes based on size/color, etc.); that’s handled by Issue #86.
biancadanforth
added a commit
that referenced
this issue
Sep 18, 2018
* Update how we pull the price string from extracted Fathom price elements to provide main and subunits (e.g. dollars and cents) if available. * Add price string cleaning methods to remove extra characters (like commas) that were causing price parsing to fail. * Handle case when price string parsing still fails after cleaning by checking in the background script that the price string is formatted correctly before rendering the browserAction popup. * This will guarantee we never see the “blank panel” reported in #79 and #88. Price element innerText strings now supported as a result of these changes: * "$1327 /each" ([Home Depot example page](https://www.homedepot.com/p/KitchenAid-Classic-4-5-Qt-Tilt-Head-White-Stand-Mixer-K45SSWH/202546032)) * "$1,049.00" ([Amazon example page](https://www.amazon.com/Fujifilm-X-T2-Mirrorless-F2-8-4-0-Lens/dp/B01I3LNQ6M/ref=sr_1_2?ie=UTF8&qid=1535594119&sr=8-2&keywords=fuji+xt2+camera)) * "US $789.99" ([Ebay example page](https://www.ebay.com/itm/Dell-Inspiron-7570-15-6-Touch-Laptop-i7-8550U-1-8GHz-8GB-1TB-NVIDIA-940MX-W10/263827294291)) * "$4.99+" ([Etsy example page](https://www.etsy.com/listing/555504975/frankenstein-2-custom-stencil?ga_order=most_relevant&ga_search_type=all&ga_view_type=gallery&ga_search_query=&ref=sr_gallery-1-13)) Note: This does not handle the case where there is more than one price for the product page (e.g. if we see a range of prices such as "$19.92 - $38.00" or if the price changes based on size/color, etc.); that’s handled by Issue #86.
biancadanforth
added a commit
that referenced
this issue
Sep 20, 2018
Fix #42: Improve price string parsing.
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Edit [bdanforth]: See also #79 (duplicate) for more information; particularly this comment.
Prices are currently extracted from pages by identifying the DOM node containing the price and parsing its
innerText
value as a string. However, some sites (notably, Home Depot) store prices in a way that doesn't produce aninnerText
value shaped the way we need it to be.First, we should define the acceptance criteria for price parsing. Then, we should improve our existing parsing to match those criteria.
The text was updated successfully, but these errors were encountered: