Start CDATA section only after uppercase `<![CDATA[` #781

Mingun · 2024-07-05T14:21:03Z

Implement more strict check for the start of CDATA section. Incorrect case is a syntax error. In that sense this PR is differs from #780 where attributes in the end tag is validation error. So we should not be case tolerant in this case.

(Review in whitespace changes ignored mode)

failures (6): syntax::cdata::lowercase::ns_reader::async_tokio syntax::cdata::lowercase::ns_reader::borrowed syntax::cdata::lowercase::ns_reader::buffered syntax::cdata::lowercase::reader::async_tokio syntax::cdata::lowercase::reader::borrowed syntax::cdata::lowercase::reader::buffered

… specification See https://www.w3.org/TR/xml11/#NT-doctypedecl

codecov-commenter · 2024-07-05T14:31:16Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 60.16%. Comparing base (7558577) to head (b71cf7c).
Report is 74 commits behind head on master.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #781      +/-   ##
==========================================
- Coverage   61.81%   60.16%   -1.66%     
==========================================
  Files          41       41              
  Lines       16798    16138     -660     
==========================================
- Hits        10384     9709     -675     
- Misses       6414     6429      +15

Flag	Coverage Δ
unittests	`60.16% <100.00%> (-1.66%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

dralley · 2024-07-05T22:46:14Z

src/reader/state.rs

                debug_assert!(buf.ends_with(b"]]"));
                Ok(Event::CData(BytesCData::wrap(
                    // Cut of `![CDATA[` and `]]` from start and end
                    &buf[8..len - 2],
                    self.decoder(),
                )))
            }
+            // XML requires uppercase only, but we will check that on validation stage:


Is there a particular reason to keep this difference in behavior intact in this PR?

HTML allows using mixed case for DOCTYPE and handling of this was intentionally added to quick-xml in 0.23.0 and related #230. In one of previous PRs you said that you prefer to keep ability to parse HTML documents, so I leaved this ability.

CDATA sections, however, are always started from uppercase variant, event in HTML. Corresponding links a couple of lines below in the comment.

I planned to work on validation after merging #766 (because it changes possible content of Text events and we will know something about content, and may do not validate that).

After validation PR will be merged, the user will be able to validate each event manually after reading it if it requires strict parser and skip validation if that is not required. The API would be:

match reader.read_event() { Text(e) => { // Get iterator over all validation problems and stop at first // You also may skip problems that you want to allow if e.validate_iter().any() { return Err(...); } // Shortcut for reporting the first problem as a `Result` e.validate()?; } // do the same for other events }

Move bang type checks outside of loop

8741abc

(Review in whitespace changes ignored mode)

Mingun requested a review from dralley July 5, 2024 14:21

Mingun added 3 commits July 5, 2024 19:22

Only uppercase <![CDATA[ starts CDATA section

6dec2b2

Use only uppercase DOCTYPE in tests where possible as required by the…

b71cf7c

… specification See https://www.w3.org/TR/xml11/#NT-doctypedecl

Mingun force-pushed the only-upper-cdata branch from 1864374 to b71cf7c Compare July 5, 2024 14:22

Mingun added the bug label Jul 5, 2024

dralley reviewed Jul 5, 2024

View reviewed changes

dralley approved these changes Jul 5, 2024

View reviewed changes

Mingun merged commit 0960333 into tafia:master Jul 6, 2024
7 checks passed

Mingun deleted the only-upper-cdata branch July 6, 2024 15:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Start CDATA section only after uppercase `<![CDATA[` #781

Start CDATA section only after uppercase `<![CDATA[` #781

Mingun commented Jul 5, 2024

codecov-commenter commented Jul 5, 2024

dralley Jul 5, 2024

Mingun Jul 6, 2024

Start CDATA section only after uppercase <![CDATA[ #781

Start CDATA section only after uppercase <![CDATA[ #781

Conversation

Mingun commented Jul 5, 2024

codecov-commenter commented Jul 5, 2024

Codecov Report

dralley Jul 5, 2024

Choose a reason for hiding this comment

Mingun Jul 6, 2024

Choose a reason for hiding this comment

Start CDATA section only after uppercase `<![CDATA[` #781

Start CDATA section only after uppercase `<![CDATA[` #781