-
Notifications
You must be signed in to change notification settings - Fork 15
Fix #29: Attempt extraction after parsing is finished, before loading. #143
Conversation
d0e07cc
to
f8ed9d3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, thanks Osmose. The refactor makes a lot of sense to me and seems to be working just fine. I tried this out on 7 random product pages, and I found that 6 of 7 extracted successfully at this earlier step.
I suggest two changes here, one being very important: Only attemptExtraction
a second time if the first time fails; as-is, we're running it twice on every page no matter what.
date: (new Date()).toISOString(), | ||
}, | ||
}); | ||
if (extractedProduct) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fallback extraction does not currently return null
if it doesn’t find anything, so this would be a good time to change that before adding this existence check. (My bad)
We do also check for correct data types via propTypes
further downstream, but it'd be nice to be consistent between the two extraction methods.
if (document.readyState === 'complete') { | ||
// Extract immediately, and again if the readyState changes. | ||
attemptExtraction(); | ||
document.addEventListener('readystatechange', () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably only want to add this listener and attempt extraction again if the first attempt failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was to perform a second extraction in case JavaScript has modified the page and changed the product info, to ensure we get the correct product info every time. I'm not sure how common this is, though, or how performing two extractions affects the user's experience.
I think we should keep it, but can be convinced otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah hmm... so see if we can get something at all to put in the popup sooner, and then update with the second round of extraction...
Perhaps we could add a scalar probe that increments if the results of the first to the second extraction are different and compare that to the number of extraction attempts? Might be a wishlist probe.
What do you think about that idea @Osmose ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I wouldn't bother adding a probe like that initially.
I refactored extraction to return null in cases where the selectors or tags for all features aren't found. I also split open graph and selector-based extraction into two separate modules. In the future, if we add optional extraction features, we'll need to modify these, but that's fine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Osmose! Looks great.
I found a bug I actually introduced earlier in the Open Graph extraction and just filed a bug. Will fix it shortly.
return null; | ||
} | ||
|
||
extractedProduct[feature] = metaEle.getAttribute('content'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a bug here, since the propType for extractedProduct
expects a type of Number
for the price value downstream. I took care of this for fallback and Fathom extraction earlier, but I neglected to add it here. I filed a follow-up issue #154 . I'll take care of that right now.
… found. Open Graph and selector-based extraction are now separate forms of extraction instead of a single, "fallback" extraction method.
f5309a3
to
150cd9b
Compare
I tested running extraction on
document_start
and universally found that extraction wouldn't turn up anything useful (surprise, you need a fully-parsed DOM to meaningfully extract anything).document_end
is pretty consistently returning good product info, which saves us having to wait for every resource to load. As far as I can tell there's not much opportunity to start work earlier than this.I also couldn't help myself and did a minor refactor of the extraction module to be more in line with how the other code modules are organized. It's mostly renames.