-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: non-printable characters in XML #6952
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have so many questions about this, starting with: is XMLSerialzer
really specced to return invalid XML which can't be parsed with DOMParser
???
In the previous PR fixing this, Neil had the same question: #4945 (review) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional comments about textToDom
, after discussion with Neil (but don't blame him if you don't like them!):
- AFAICT there's no reason to ever create more than one
DOMParser
(orXMLSerializer
) instance, so I suggest having the module create a single module-local one and use it everywhere. textToDomDocument
is only used bytextToDom
in this module (and nowhere else), so I suggest it be deprecated and havetextToDom
always use (the)DOMParser
(instance) directly.- I'm unclear about why
textToDom
tries parsing as HTML if parsing as XML fails. That seems risky, and at least deserves an inline comment if not a note in the JSDoc. - Normally I prefer handling errors at the top with an
if (fail) return;
, but in this case the error handlers end up getting nested, so I think it would be clearer to rewrite as:try xml if (xml win) return xml try html if (html win) return html throw 'I give up';
@@ -708,10 +708,10 @@ Serializer.Fields.Variable.Types = new SerializerTestCase('Types', | |||
Serializer.Fields.Variable.Tabs = new SerializerTestCase('Tabs', | |||
'<xml xmlns="https://developers.google.com/blockly/xml">' + | |||
'<variables>' + | |||
'<variable id="aaaaaaaaaaaaaaaaaaaa">line1 line2 line3</variable>' + | |||
'<variable id="aaaaaaaaaaaaaaaaaaaa">line1&#x9line2&#x9line3</variable>' + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that these tests still use hex.
@cpcallen Ok I think this is actually good for another look! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for some nit-picking about the tests.
In addition to the specifics below, I think it would be wise to check all of the invalid characters in the tests (especially \x00
).
The basics
npm run format
andnpm run lint
The details
Resolves
Fixes #4590
Proposed Changes
Makes it so that non-printable characters are properly escaped when serializing to XML, and that we do best-effort parsing when deserializing unescaped characters.
Reason for Changes
App Inventor needs this to be pulled into core for them to be able to update.
Test Coverage
Unit tests for serializing and deserializing.
Documentation
N/A
Additional Information
N/A