Optimise and refine #52

neilj · 2015-04-06T05:12:30Z

This pull request encompasses the following refinements:

Optimise: without changing the semantics, many easy optimisations were made, for example changing from searching an array for allowed tags/attributes [O(n)] to constructing a lookup table [O(1)].
Fix small documentation errors, typos and whitespace inconsistencies.
Add an "isSupported" property to the DOMPurify object to allow users to easily check if the browser supports the full DOMPurify.
Fix a bug where sanitize would return a string even when RETURN_DOM is set, if the input had no "<" character.
Always return the full html including browser inferred <html><body> etc. tags if WHOLE_DOCUMENT config option is set, even if the input has no "<" character.

The test suite has one modification for (5). No other semantic changes were made and the full test suite runs without error in latest Chrome.

Deep cloning (cloning all children along with the node) is unnecessary as only the attributes are being used. Shallow cloning is much less expensive.

* Use regexp.test as it's faster for checking if there's a match. * Use non-capturing group as capture not used.

Unless "#comment" is added to the allowed types, it won't be kept anyway. And if someone did want to keep comments, they could now add this to the allowed tags list.

It used to check for the presence of a currentNode.nodeName.toLowerCase property. Not sure why, but this MUST always exist otherwise an error would have been thrown at line 316 when currentNode.nodeName.toLowerCase() was called, since currentNode has not been reassigned or significatnly changed since then.

As it's entirely unused.

This means _initDocument now always returns a DOM node, rather than sometimes a string, making it easier for compilers to optimise and for humans to understand.

The dirty parameter has to be a string. If it were an existing DOM node, setting outerHTML will result in converting it to a string, which will be something like `<body>[object HTMLDivElement]</body>`, not the contents of the DOM node itself.

* Should not shortcut if expecting a DOM object to be returned. * Should not shortcut if expecting the whole document including <html> etc. to be returned.

So library users can easily check if the browser supports sanitising properly.

Calling Array#indexOf on the list for every node/attribute is expensive as this is an O(n) operation. Much better to convert to an object first, then we can lookup in O(1).

At the moment, they are re-evaluated on every call to DOMPurify#sanitize(), which is unnecessarily work for subsequent calls.

The check in _initDocument that the node is of the right type will always fail, since the node will never be both an instance of HTMLBodyElement and HTMLHtmlElement. Fix this to only check against the type we expect based on WHOLE_DOCUMENT cofig, and tidy code to be a little easier to read.

mathiasbynens · 2015-04-06T05:56:25Z

Can you add a test for the RETURN_DOM fix, so that this doesn’t regress?

Make sure the DOM node is returned even if the string has no "<" character.

neilj · 2015-04-06T07:15:47Z

Sure, no problem. Done.

fhemberger · 2015-04-06T07:41:20Z

Cool, thank you!

mathiasbynens · 2015-04-06T07:43:33Z

Also please add a test that checks if the isSupported property is present (so we don’t accidentally remove/rename it in the future). Thanks!

neilj · 2015-04-06T07:49:13Z

Sure, done.

mathiasbynens · 2015-04-06T07:50:16Z

@neilj

cure53 · 2015-04-06T07:53:50Z

Thanks, @neilj! That was a lot of work and very good changes. Very much appreciated.

I am currently reviewing if the changes introduced any security regressions (specifically on MSIE). Once done, I am ready to merge. Should be the case by around tomorrow.

neilj · 2015-04-06T08:03:23Z

Thanks @cure53; code review is definitely important in a project like this! I'm looking into using DOMPurify as an extra layer of defence in the FastMail web UI, so I was more than happy to spend a few hours this morning tidying it up as I read through and understood the code.

cure53 · 2015-04-06T08:09:02Z

@neilj Fully agreed, and if you don't mind, I would like to add your name to the acknowledgements section in the README.md. In case all is fine on MSIE10+ as well (cannot test right now, am on the road), I will probably trigger a new release tomorrow as well. Thanks again :)

neilj · 2015-04-06T08:09:58Z

Sure, that would be great. Thanks.

cure53 · 2015-04-06T10:16:00Z

Tests are working well on IE10+, FF and Spartan. I am ready for a merge. The change-review was successful as well - no objections from my side.

Any objections, @fhemberger and @mathiasbynens?

fhemberger · 2015-04-06T10:29:55Z

LGTM 👍

Optimise and refine

mathiasbynens · 2015-04-07T14:03:43Z

Belated LGTM 👍 (#52 (comment) already implied it, kinda)

neilj added 15 commits April 6, 2015 10:33

Fix whitespace inconsistencies.

75534cf

Fix remove element comment placement & semantics

814a95e

Fix typo in documentation

1d8a1bc

Shallow clone node in _sanitizeAttributes

fb44569

Deep cloning (cloning all children along with the node) is unnecessary as only the attributes are being used. Shallow cloning is much less expensive.

Optimise regular expressions

9faf67e

* Use regexp.test as it's faster for checking if there's a match. * Use non-capturing group as capture not used.

Remove needless check for comment node type.

db1e32b

Unless "#comment" is added to the allowed types, it won't be kept anyway. And if someone did want to keep comments, they could now add this to the allowed tags list.

Remove DEBUG_OUTPUT option.

0467cc8

As it's entirely unused.

Move check for safe string out of _initDocument.

2123a66

This means _initDocument now always returns a DOM node, rather than sometimes a string, making it easier for compilers to optimise and for humans to understand.

Fix conditions for shortcut return from sanitize

3c066fd

* Should not shortcut if expecting a DOM object to be returned. * Should not shortcut if expecting the whole document including <html> etc. to be returned.

Add isSupported property to DOMPurify.

b613530

So library users can easily check if the browser supports sanitising properly.

Convert allowed tags/attr lists to lookup tables.

606821c

Calling Array#indexOf on the list for every node/attribute is expensive as this is an O(n) operation. Much better to convert to an object first, then we can lookup in O(1).

Move internal methods/variables out of sanitize closure.

224f1e0

At the moment, they are re-evaluated on every call to DOMPurify#sanitize(), which is unnecessarily work for subsequent calls.

Add regression test for RETURN_DOM with simple string.

daf539d

Make sure the DOM node is returned even if the string has no "<" character.

Add test for presence of isSupported property.

7006637

cure53 added a commit that referenced this pull request Apr 7, 2015

Merge pull request #52 from fastmail/refine

450a9f2

Optimise and refine

cure53 merged commit 450a9f2 into cure53:master Apr 7, 2015

neilj deleted the refine branch April 7, 2015 15:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimise and refine #52

Optimise and refine #52

neilj commented Apr 6, 2015

mathiasbynens commented Apr 6, 2015

neilj commented Apr 6, 2015

fhemberger commented Apr 6, 2015

mathiasbynens commented Apr 6, 2015

neilj commented Apr 6, 2015

mathiasbynens commented Apr 6, 2015

cure53 commented Apr 6, 2015

neilj commented Apr 6, 2015

cure53 commented Apr 6, 2015

neilj commented Apr 6, 2015

cure53 commented Apr 6, 2015

fhemberger commented Apr 6, 2015

mathiasbynens commented Apr 7, 2015

Optimise and refine #52

Optimise and refine #52

Conversation

neilj commented Apr 6, 2015

mathiasbynens commented Apr 6, 2015

neilj commented Apr 6, 2015

fhemberger commented Apr 6, 2015

mathiasbynens commented Apr 6, 2015

neilj commented Apr 6, 2015

mathiasbynens commented Apr 6, 2015

cure53 commented Apr 6, 2015

neilj commented Apr 6, 2015

cure53 commented Apr 6, 2015

neilj commented Apr 6, 2015

cure53 commented Apr 6, 2015

fhemberger commented Apr 6, 2015

mathiasbynens commented Apr 7, 2015