Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise and refine #52

Merged
merged 17 commits into from
Apr 7, 2015
Merged

Optimise and refine #52

merged 17 commits into from
Apr 7, 2015

Conversation

neilj
Copy link
Contributor

@neilj neilj commented Apr 6, 2015

This pull request encompasses the following refinements:

  1. Optimise: without changing the semantics, many easy optimisations were made, for example changing from searching an array for allowed tags/attributes [O(n)] to constructing a lookup table [O(1)].
  2. Fix small documentation errors, typos and whitespace inconsistencies.
  3. Add an "isSupported" property to the DOMPurify object to allow users to easily check if the browser supports the full DOMPurify.
  4. Fix a bug where sanitize would return a string even when RETURN_DOM is set, if the input had no "<" character.
  5. Always return the full html including browser inferred <html><body> etc. tags if WHOLE_DOCUMENT config option is set, even if the input has no "<" character.

The test suite has one modification for (5). No other semantic changes were made and the full test suite runs without error in latest Chrome.

neilj added 15 commits April 6, 2015 10:33
Deep cloning (cloning all children along with the node) is unnecessary as only
the attributes are being used. Shallow cloning is much less expensive.
* Use regexp.test as it's faster for checking if there's a match.
* Use non-capturing group as capture not used.
Unless "#comment" is added to the allowed types, it won't be kept anyway. And
if someone did want to keep comments, they could now add this to the allowed
tags list.
It used to check for the presence of a currentNode.nodeName.toLowerCase
property. Not sure why, but this MUST always exist otherwise an error would
have been thrown at line 316 when currentNode.nodeName.toLowerCase() was called,
since currentNode has not been reassigned or significatnly changed since then.
As it's entirely unused.
This means _initDocument now always returns a DOM node, rather than sometimes
a string, making it easier for compilers to optimise and for humans to
understand.
The dirty parameter has to be a string. If it were an existing DOM node,
setting outerHTML will result in converting it to a string, which will be
something like `<body>[object HTMLDivElement]</body>`, not the contents of the
DOM node itself.
* Should not shortcut if expecting a DOM object to be returned.
* Should not shortcut if expecting the whole document including <html> etc.
  to be returned.
So library users can easily check if the browser supports sanitising properly.
Calling Array#indexOf on the list for every node/attribute is expensive as this
is an O(n) operation. Much better to convert to an object first, then we can
lookup in O(1).
At the moment, they are re-evaluated on every call to DOMPurify#sanitize(),
which is unnecessarily work for subsequent calls.
The check in _initDocument that the node is of the right type will always fail,
since the node will never be both an instance of HTMLBodyElement and
HTMLHtmlElement. Fix this to only check against the type we expect based on
WHOLE_DOCUMENT cofig, and tidy code to be a little easier to read.
@mathiasbynens
Copy link
Contributor

Can you add a test for the RETURN_DOM fix, so that this doesn’t regress?

Make sure the DOM node is returned even if the string has no "<" character.
@neilj
Copy link
Contributor Author

neilj commented Apr 6, 2015

Sure, no problem. Done.

@fhemberger
Copy link
Contributor

Cool, thank you!

@mathiasbynens
Copy link
Contributor

Also please add a test that checks if the isSupported property is present (so we don’t accidentally remove/rename it in the future). Thanks!

@neilj
Copy link
Contributor Author

neilj commented Apr 6, 2015

Sure, done.

@mathiasbynens
Copy link
Contributor

@neilj

@cure53
Copy link
Owner

cure53 commented Apr 6, 2015

Thanks, @neilj! That was a lot of work and very good changes. Very much appreciated.

I am currently reviewing if the changes introduced any security regressions (specifically on MSIE). Once done, I am ready to merge. Should be the case by around tomorrow.

@neilj
Copy link
Contributor Author

neilj commented Apr 6, 2015

Thanks @cure53; code review is definitely important in a project like this! I'm looking into using DOMPurify as an extra layer of defence in the FastMail web UI, so I was more than happy to spend a few hours this morning tidying it up as I read through and understood the code.

@cure53
Copy link
Owner

cure53 commented Apr 6, 2015

@neilj Fully agreed, and if you don't mind, I would like to add your name to the acknowledgements section in the README.md. In case all is fine on MSIE10+ as well (cannot test right now, am on the road), I will probably trigger a new release tomorrow as well. Thanks again :)

@neilj
Copy link
Contributor Author

neilj commented Apr 6, 2015

Sure, that would be great. Thanks.

@cure53
Copy link
Owner

cure53 commented Apr 6, 2015

Tests are working well on IE10+, FF and Spartan. I am ready for a merge. The change-review was successful as well - no objections from my side.

Any objections, @fhemberger and @mathiasbynens?

@fhemberger
Copy link
Contributor

LGTM 👍

cure53 added a commit that referenced this pull request Apr 7, 2015
@cure53 cure53 merged commit 450a9f2 into cure53:master Apr 7, 2015
@mathiasbynens
Copy link
Contributor

Belated LGTM 👍 (#52 (comment) already implied it, kinda)

@neilj neilj deleted the refine branch April 7, 2015 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants