Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: non-printable characters in XML #6952

Merged
merged 11 commits into from
Apr 17, 2023

Conversation

BeksOmega
Copy link
Collaborator

The basics

  • I branched from develop
  • My pull request is against develop
  • My code follows the style guide
  • I ran npm run format and npm run lint

The details

Resolves

Fixes #4590

Proposed Changes

Makes it so that non-printable characters are properly escaped when serializing to XML, and that we do best-effort parsing when deserializing unescaped characters.

Reason for Changes

App Inventor needs this to be pulled into core for them to be able to update.

Test Coverage

Unit tests for serializing and deserializing.

Documentation

N/A

Additional Information

N/A

@BeksOmega BeksOmega requested a review from a team as a code owner April 4, 2023 18:05
@BeksOmega BeksOmega requested a review from cpcallen April 4, 2023 18:05
@github-actions github-actions bot added the PR: fix Fixes a bug label Apr 4, 2023
Copy link
Contributor

@cpcallen cpcallen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have so many questions about this, starting with: is XMLSerialzer really specced to return invalid XML which can't be parsed with DOMParser???

core/utils/xml.ts Outdated Show resolved Hide resolved
tests/mocha/serializer_test.js Outdated Show resolved Hide resolved
core/utils/xml.ts Show resolved Hide resolved
core/utils/xml.ts Outdated Show resolved Hide resolved
core/utils/xml.ts Outdated Show resolved Hide resolved
@BeksOmega
Copy link
Collaborator Author

I have so many questions about this, starting with: is XMLSerialzer really specced to return invalid XML which can't be parsed with DOMParser???

In the previous PR fixing this, Neil had the same question: #4945 (review)

Copy link
Contributor

@cpcallen cpcallen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional comments about textToDom, after discussion with Neil (but don't blame him if you don't like them!):

  • AFAICT there's no reason to ever create more than one DOMParser (or XMLSerializer) instance, so I suggest having the module create a single module-local one and use it everywhere.
  • textToDomDocument is only used by textToDom in this module (and nowhere else), so I suggest it be deprecated and have textToDom always use (the) DOMParser (instance) directly.
  • I'm unclear about why textToDom tries parsing as HTML if parsing as XML fails. That seems risky, and at least deserves an inline comment if not a note in the JSDoc.
  • Normally I prefer handling errors at the top with an if (fail) return;, but in this case the error handlers end up getting nested, so I think it would be clearer to rewrite as:
    try xml
    if (xml win) return xml
    try html
    if (html win) return html
    throw 'I give up';

core/utils/xml.ts Outdated Show resolved Hide resolved
@@ -708,10 +708,10 @@ Serializer.Fields.Variable.Types = new SerializerTestCase('Types',
Serializer.Fields.Variable.Tabs = new SerializerTestCase('Tabs',
'<xml xmlns="https://developers.google.com/blockly/xml">' +
'<variables>' +
'<variable id="aaaaaaaaaaaaaaaaaaaa">line1 line2 line3</variable>' +
'<variable id="aaaaaaaaaaaaaaaaaaaa">line1&amp;#x9line2&amp;#x9line3</variable>' +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that these tests still use hex.

@BeksOmega
Copy link
Collaborator Author

@cpcallen Ok I think this is actually good for another look!

Copy link
Contributor

@cpcallen cpcallen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for some nit-picking about the tests.

In addition to the specifics below, I think it would be wise to check all of the invalid characters in the tests (especially \x00).

core/utils/xml.ts Outdated Show resolved Hide resolved
tests/mocha/xml_test.js Outdated Show resolved Hide resolved
tests/mocha/xml_test.js Outdated Show resolved Hide resolved
@BeksOmega BeksOmega merged commit edc5843 into google:develop Apr 17, 2023
@BeksOmega BeksOmega deleted the fix/non-print-chars branch April 18, 2023 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR: fix Fixes a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Blockly does not serialize non-printable characters correctly
3 participants