Skip to content

ChromeDistractionFreeBrowsing

Erik Rose edited this page Mar 28, 2016 · 4 revisions

Chrome uses DOM Distiller to simplify pages for reading mode and printing (printing maybe Canary only?). Here's how you turn it on.

First, she trained a model manually based on some representative set of typical pages to characterize a page as being "worth distilling" based on some cheap to compute features (their readme). It looks like those features are here, with much discussion about them on this commit. I find all the crunchy details in her source.

Once she decides that a page is worth distilling, Chrome runs the actual "distiller" code, which is Java compiled via GWT to Javascript. It looks like there's some overlap with Readability as well (note: that's a good place to look for next/previous page extraction heuristics). The algorithm's entry point lives in ContentExtractor#extractContent. She runs an extractor on the page, building up an initially empty document. She bases extraction on some (poorly commented 😔) heuristics; I think here is where the interesting logic happens. Next she runs some filters to remove unnecessary stuff from the page.

Clone this wiki locally