-
Notifications
You must be signed in to change notification settings - Fork 238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix DocType closing bracket recognition in buffered reader #802
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## master #802 +/- ##
==========================================
- Coverage 61.81% 60.17% -1.65%
==========================================
Files 41 41
Lines 16798 15958 -840
==========================================
- Hits 10384 9603 -781
+ Misses 6414 6355 -59
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Modified the code so that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That PR will not fix the problem entirely, because correct DTD can contain unbalanced number of <
and >
, but I think that is rare case, so until proper fix will be implemented, a partial one also acceptable. The implementation, however, still may be improved. The problem here that we do not store state between invocations of BangType::parse
. So the solution is to store balance inside BangType::DocType
, then, I think, it will be unnecessary to calculate it over buf
(because everything in buf
previously was in chunk
).
I also see the problem in the external for i in memchr::memchr_iter(b'>', chunk)
loop. It will skip <
s inside chunk
s that does not contain >
, so the balance will be calculated incorrectly. I think, if create a test carefully, it will catch such situation.
So I ask you to do the following changes:
- convert
BangType::DocType
->BangType::DocType(i32)
- start with balance == 1 (it is
<
in<!DOCTYPE
) here
Line 1031 in 2f3824a
Some(b'D') | Some(b'd') => Self::DocType, - when calculate balance take into account saved balance
- emit event only when
BangType::DocType(0)
here
Line 147 in 2f3824a
BangType::DocType if uncased_starts_with(buf, b"!DOCTYPE") => { - add tests for the mentioned case (chunk with
<
+ chunk with>
which closes<
from the first chunk). It is enough just craft such a string and use appropriate buffer size to slice string to desired chunks - also add regression test from Unexpected Bang Since 0.23.1 #590 (comment)
- try to check if this fixes Deserialization of a doctype with very long content fails #533. I think it should because it seems to be a duplicate of these issues, but need to check
Thanks for the detailed feedback! I adjusted the PR.
As the balance is calculated in up to chunk excluding current found
Are regression tests for issues 533, 590, 801 enough? Or are additional tests desired?
This PR does indeed fixes the issue. Added a regression test for the case, and modified the initial post so the issue will be closed when this PR is merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are regression tests for issues 533, 590, 801 enough? Or are additional tests desired?
I'm not sure that tests still cover the case when we get four chunks with roughly such content:
<!DOCTYPE
<
without any>
>
, which would close<
from point 2, not the doctype>
which should close doctype
If that case already covered, fine, although I would prefer to have test which explicitly guaranties such conditions (can tweak one of added regression tests for that).
And please add changelog entry to Changelog.md
that people know that bug (#533) fixed. I'll close other as duplicate.
You also can squash all your changes if you wish |
Done! I adjusted regression test for issue801 so that all angle brackets are explicitly in different buffer (by lowering buffer size to 2 bytes). I think it might be better (and easier with GitHub UI) to squash merge on your end, so others can still follow the conversation in this PR in the future. |
d66842a
to
3ebe221
Compare
Thanks! |
Fixes #533, fixes #590, fixes #801