fix: non-printable characters in XML #6952

BeksOmega · 2023-04-04T18:05:37Z

The basics

I branched from develop
My pull request is against develop
My code follows the style guide
I ran npm run format and npm run lint

The details

Resolves

Proposed Changes

Makes it so that non-printable characters are properly escaped when serializing to XML, and that we do best-effort parsing when deserializing unescaped characters.

Reason for Changes

App Inventor needs this to be pulled into core for them to be able to update.

Test Coverage

Unit tests for serializing and deserializing.

Documentation

N/A

Additional Information

N/A

cpcallen

I have so many questions about this, starting with: is XMLSerialzer really specced to return invalid XML which can't be parsed with DOMParser???

core/utils/xml.ts

tests/mocha/serializer_test.js

core/utils/xml.ts

BeksOmega · 2023-04-06T15:45:52Z

I have so many questions about this, starting with: is XMLSerialzer really specced to return invalid XML which can't be parsed with DOMParser???

In the previous PR fixing this, Neil had the same question: #4945 (review)

cpcallen

Additional comments about textToDom, after discussion with Neil (but don't blame him if you don't like them!):

AFAICT there's no reason to ever create more than one DOMParser (or XMLSerializer) instance, so I suggest having the module create a single module-local one and use it everywhere.
textToDomDocument is only used by textToDom in this module (and nowhere else), so I suggest it be deprecated and have textToDom always use (the) DOMParser (instance) directly.
I'm unclear about why textToDom tries parsing as HTML if parsing as XML fails. That seems risky, and at least deserves an inline comment if not a note in the JSDoc.
Normally I prefer handling errors at the top with an if (fail) return;, but in this case the error handlers end up getting nested, so I think it would be clearer to rewrite as:
```
try xml
if (xml win) return xml
try html
if (html win) return html
throw 'I give up';
```

core/utils/xml.ts

NeilFraser · 2023-04-06T16:45:50Z

tests/mocha/serializer_test.js

@@ -708,10 +708,10 @@ Serializer.Fields.Variable.Types = new SerializerTestCase('Types',
 Serializer.Fields.Variable.Tabs = new SerializerTestCase('Tabs',
    '<xml xmlns="https://developers.google.com/blockly/xml">' +
    '<variables>' +
-    '<variable id="aaaaaaaaaaaaaaaaaaaa">line1	line2	line3</variable>' +
+    '<variable id="aaaaaaaaaaaaaaaaaaaa">line1&amp;#x9line2&amp;#x9line3</variable>' +


Note that these tests still use hex.

core/utils/xml.ts

BeksOmega · 2023-04-11T15:49:32Z

@cpcallen Ok I think this is actually good for another look!

cpcallen

LGTM except for some nit-picking about the tests.

In addition to the specifics below, I think it would be wise to check all of the invalid characters in the tests (especially \x00).

core/utils/xml.ts

tests/mocha/xml_test.js

fix: non-printable characters in XMl

b3d6778

BeksOmega requested a review from a team as a code owner April 4, 2023 18:05

BeksOmega requested a review from cpcallen April 4, 2023 18:05

github-actions bot assigned cpcallen Apr 4, 2023

github-actions bot added the PR: fix Fixes a bug label Apr 4, 2023

cpcallen reviewed Apr 6, 2023

View reviewed changes

core/utils/xml.ts Outdated Show resolved Hide resolved

tests/mocha/serializer_test.js Outdated Show resolved Hide resolved

core/utils/xml.ts Show resolved Hide resolved

NeilFraser reviewed Apr 6, 2023

View reviewed changes

core/utils/xml.ts Outdated Show resolved Hide resolved

NeilFraser reviewed Apr 6, 2023

View reviewed changes

core/utils/xml.ts Outdated Show resolved Hide resolved

BeksOmega added 2 commits April 6, 2023 16:02

fix: PR comments

ebfe2a7

chore: format

a491457

cpcallen requested changes Apr 6, 2023

View reviewed changes

cpcallen reviewed Apr 6, 2023

View reviewed changes

core/utils/xml.ts Outdated Show resolved Hide resolved

BeksOmega added 2 commits April 6, 2023 16:36

chore: move to module-level parser and serializer

c188335

chore: reorganize textToDom

6454eca

NeilFraser reviewed Apr 6, 2023

View reviewed changes

cpcallen reviewed Apr 6, 2023

View reviewed changes

core/utils/xml.ts Show resolved Hide resolved

BeksOmega added 3 commits April 10, 2023 22:41

chore: add dummy implementations of domParser and xmlSerializer

0b27ca9

chore: properly check classes before constructing

c1bd1d8

chore: fix tests

f837420

cpcallen requested changes Apr 17, 2023

View reviewed changes

core/utils/xml.ts Outdated Show resolved Hide resolved

tests/mocha/xml_test.js Outdated Show resolved Hide resolved

tests/mocha/xml_test.js Outdated Show resolved Hide resolved

BeksOmega added 2 commits April 17, 2023 16:29

chore: PR comments

ab40de7

chore: remove null char from tests

98b6c8b

cpcallen approved these changes Apr 17, 2023

View reviewed changes

chore: docs!

4666d01

BeksOmega merged commit edc5843 into google:develop Apr 17, 2023

BeksOmega deleted the fix/non-print-chars branch April 18, 2023 20:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: non-printable characters in XML #6952

fix: non-printable characters in XML #6952

BeksOmega commented Apr 4, 2023

cpcallen left a comment

BeksOmega commented Apr 6, 2023

cpcallen left a comment

NeilFraser Apr 6, 2023

BeksOmega commented Apr 11, 2023

cpcallen left a comment

fix: non-printable characters in XML #6952

fix: non-printable characters in XML #6952

Conversation

BeksOmega commented Apr 4, 2023

The basics

The details

Resolves

Proposed Changes

Reason for Changes

Test Coverage

Documentation

Additional Information

cpcallen left a comment

Choose a reason for hiding this comment

BeksOmega commented Apr 6, 2023

cpcallen left a comment

Choose a reason for hiding this comment

NeilFraser Apr 6, 2023

Choose a reason for hiding this comment

BeksOmega commented Apr 11, 2023

cpcallen left a comment

Choose a reason for hiding this comment