-
Notifications
You must be signed in to change notification settings - Fork 74
ChromeDistractionFreeBrowsing
Chrome uses DOM Distiller to simplify pages for reading mode and printing (printing maybe Canary only?). Here's how you turn it on.
First, she trained a model manually based on some representative set of typical pages to characterize a page as being "worth distilling" based on some cheap to compute features (their readme). It looks like those features are here, with much discussion about them on this commit. I find all the crunchy details in her source.
Once she decides that a page is worth distilling, Chrome runs the actual "distiller" code, which is Java compiled via GWT to Javascript. It looks like there's some overlap with Readability as well (note: that's a good place to look for next/previous page extraction heuristics). The algorithm's entry point lives in ContentExtractor#extractContent. She runs an extractor on the page, building up an initially empty document. She bases extraction on some (poorly commented 😔) heuristics; I think here is where the interesting logic happens. Next she runs some filters to remove unnecessary stuff from the page.