-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add at least rudimentary DTD handling #50
Comments
Mentoring InstructionsThe DTD grammar and tokens are defined in the XML spec. Work for this will probably largely live in Line 492 in 5e2253d
For now, we aren't really worried about amazing performance or even collecting the data, all we need to make sure of is that the parser can accept the input data. A small example to add to the tests: <?xml version="1.0"?>
<!DOCTYPE cXML SYSTEM "http://xml.cxml.org/schemas/cXML/1.2.014/cXML.dtd">
<cXML /> Strangely, it kind of seems like this much of it should already be supported; I'm guessing that there's some small mistake in what the parser accepts. |
OK, so I've been tinkering and learning. So far, I have found that the presence of a I traced the failure back to Lines 385 to 387 in 5e2253d
I'm assuming that the 2nd I don't really see a reason for this, but at this superficial level I suspect the bug is actually in the |
It's actually because the grammar for commit 7e6b99340b40a06ec938c9d12b96d103c4e4a5ac
Author: Jake Goulding <[email protected]>
Date: Wed Mar 7 19:31:17 2018 -0500
DTD SystemLiterals can contain any character
diff --git a/src/parser.rs b/src/parser.rs
index 995ba0d..b645f9c 100644
--- a/src/parser.rs
+++ b/src/parser.rs
@@ -481,8 +481,10 @@ fn parse_external_id<'a>(pm: &mut XmlMaster<'a>, xml: StringPoint<'a>)
let (xml, _) = try_parse!(xml.expect_literal("SYSTEM"));
let (xml, _) = try_parse!(xml.expect_space());
let (xml, external_id) = try_parse!(
- parse_quoted_value(pm, xml, |_, xml, _| xml.consume_name().map_err(|_| SpecificError::ExpectedSystemLiteral))
- );
+ parse_quoted_value(pm, xml, |_, xml, quote| {
+ xml.consume_attribute_value(quote).map_err(|_| SpecificError::ExpectedSystemLiteral)
+ })
+ );
success(external_id, xml)
}
@@ -1311,11 +1313,11 @@ mod test {
#[test]
fn a_prolog_with_a_document_type_declaration() {
- let package = quick_parse("<?xml version='1.0'?><!DOCTYPE doc SYSTEM \"doc.dtd\"><hello/>");
+ let package = quick_parse(r#"<?xml version="1.0"?><!DOCTYPE cXML SYSTEM "http://xml.cxml.org/schemas/cXML/1.2.014/cXML.dtd"><cXML />"#);
let doc = package.as_document();
let top = top(&doc);
- assert_qname_eq!(top.name(), "hello");
+ assert_qname_eq!(top.name(), "cXML");
}
#[test] |
Oh, mm. 😊 I guess I was about to go on a wild goose chase. |
Taking almost verbatim what was shown here, the test is passing. In order to not be a total plagiarist, I'm going to try and work up a solution for the OP's case. Worth a shot at least! |
I've added the example from @frp as a test (which is now passing). Not sure what else I can do outside of addressing any nitpicks. |
sxd-document does not work if the document has internal DTD.
Trying to parse the example document (taken from here):
I get the following error:
(37, [Expected("SYSTEM")])
. 37 is the position of the first[
.Adding full parsing of internal DTD might be quite a big deal, but, for my purposes (parsing JMdict) just ignoring the DTD content would be fine.
The text was updated successfully, but these errors were encountered: