ChromeDistractionFreeBrowsing

Chrome uses DOM Distiller to simplify pages for reading mode and printing (printing maybe Canary only?). Here's how you turn it on.

First, she trained a model manually based on some representative set of typical pages to characterize a page as being "worth distilling" based on some cheap to compute features (their readme). It looks like those features are here, with much discussion about them on this commit. I find all the crunchy details in her source.

Once she decides that a page is worth distilling, Chrome runs the actual "distiller" code, which is Java compiled via GWT to Javascript. It looks like there's some overlap with Readability as well (note: that's a good place to look for next/previous page extraction heuristics). The algorithm's entry point lives in ContentExtractor#extractContent. She runs an extractor on the page, building up an initially empty document. She bases extraction on some (poorly commented 😔) heuristics; I think here is where the interesting logic happens. Next she runs some filters to remove unnecessary stuff from the page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ChromeDistractionFreeBrowsing

Clone this wiki locally