-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix DTD parsing issue resulting in ExpectedClosingQuote
error.
#60
Conversation
src/parser.rs
Outdated
/* without the optional intSubset */ | ||
fn parse_document_type_declaration<'a>(pm: &mut XmlMaster<'a>, xml: StringPoint<'a>) -> XmlProgress<'a, Token<'a>> { | ||
let (xml, _) = try_parse!(xml.expect_literal("<!DOCTYPE")); | ||
let (xml, _) = try_parse!(xml.expect_space()); | ||
let (xml, _type_name) = try_parse!(xml.consume_name().map_err(|_| SpecificError::ExpectedDocumentTypeName)); | ||
let (xml, _external_id) = try_parse!(parse_external_id(pm, xml)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The corresponding grammar here is
'<!DOCTYPE' S Name (S ExternalID)? S? ('[' intSubset ']' S?)? '>'
This means that 0, 1, or 2 of the external ID and an internal subset can be present. The current code only supports exactly one or the other, closer to this grammar:
'<!DOCTYPE' S Name (ext | int) '>'
This PR is still an improvement though! I see two paths:
- If you'd be up for adding more support, then we can add tests for the 0 and 2 cases.
- If you'd rather stop here, I'd prefer a comment be added to this newly-added
alternate
stating that the grammar isn't yet fully correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me take another quick swing at it. I misunderstood the spec thinking you could have at most one or the other. Interesting.
I've been flip-flopping a bit on terminology. Moving towards using "int subset" rather than "internal dtd." Also added some extra tests to try and cover the various combinations (including explicit checks for whitespace handling towards the end of the node). I think this is much closer to the spec while still skipping the modeling of the individual elements inside the int subset. |
Patched this fork in today at work, and it seems to be working well for us ;) |
Certain characters (such as `/`) in the document type declaration (DTD) could cause the parser to fail. This updates `parse_external_id()` to use a slightly more permissive function to consume the attribute string all the way to the closing quote.
ExpectedClosingQuote
error.ExpectedClosingQuote
error.
As a heads-up, I made some minor changes to your commits: trailing |
Of course. I'm just a guest here. This is your house! |
Certain characters (such as
/
) in the document type declaration (dtd)could cause the parser to fail.
This updates
sxd_document::parser::parse_external_id()
to use aslightly different (more permissive) function to consume the attribute
string.
refs #50, #59