-
-
Notifications
You must be signed in to change notification settings - Fork 733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimise and refine #52
Conversation
Deep cloning (cloning all children along with the node) is unnecessary as only the attributes are being used. Shallow cloning is much less expensive.
* Use regexp.test as it's faster for checking if there's a match. * Use non-capturing group as capture not used.
Unless "#comment" is added to the allowed types, it won't be kept anyway. And if someone did want to keep comments, they could now add this to the allowed tags list.
It used to check for the presence of a currentNode.nodeName.toLowerCase property. Not sure why, but this MUST always exist otherwise an error would have been thrown at line 316 when currentNode.nodeName.toLowerCase() was called, since currentNode has not been reassigned or significatnly changed since then.
As it's entirely unused.
This means _initDocument now always returns a DOM node, rather than sometimes a string, making it easier for compilers to optimise and for humans to understand.
The dirty parameter has to be a string. If it were an existing DOM node, setting outerHTML will result in converting it to a string, which will be something like `<body>[object HTMLDivElement]</body>`, not the contents of the DOM node itself.
* Should not shortcut if expecting a DOM object to be returned. * Should not shortcut if expecting the whole document including <html> etc. to be returned.
So library users can easily check if the browser supports sanitising properly.
Calling Array#indexOf on the list for every node/attribute is expensive as this is an O(n) operation. Much better to convert to an object first, then we can lookup in O(1).
At the moment, they are re-evaluated on every call to DOMPurify#sanitize(), which is unnecessarily work for subsequent calls.
The check in _initDocument that the node is of the right type will always fail, since the node will never be both an instance of HTMLBodyElement and HTMLHtmlElement. Fix this to only check against the type we expect based on WHOLE_DOCUMENT cofig, and tidy code to be a little easier to read.
Can you add a test for the |
Make sure the DOM node is returned even if the string has no "<" character.
Sure, no problem. Done. |
Cool, thank you! |
Also please add a test that checks if the |
Sure, done. |
Thanks, @neilj! That was a lot of work and very good changes. Very much appreciated. I am currently reviewing if the changes introduced any security regressions (specifically on MSIE). Once done, I am ready to merge. Should be the case by around tomorrow. |
@neilj Fully agreed, and if you don't mind, I would like to add your name to the acknowledgements section in the README.md. In case all is fine on MSIE10+ as well (cannot test right now, am on the road), I will probably trigger a new release tomorrow as well. Thanks again :) |
Sure, that would be great. Thanks. |
Tests are working well on IE10+, FF and Spartan. I am ready for a merge. The change-review was successful as well - no objections from my side. Any objections, @fhemberger and @mathiasbynens? |
LGTM 👍 |
Belated LGTM 👍 (#52 (comment) already implied it, kinda) |
This pull request encompasses the following refinements:
<html><body>
etc. tags if WHOLE_DOCUMENT config option is set, even if the input has no "<" character.The test suite has one modification for (5). No other semantic changes were made and the full test suite runs without error in latest Chrome.