-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve handler #111
Comments
They are -- the HTML5 parser only concerns itself with parsing to construct a DOM. |
Are you going to fix this situation? |
Does it need to be fixed? What's the use-case? |
Yes! In the cases I've shown above, |
SAXParser notifies about element start and element end, not about start tag and end tag. That's all.
If you really need low-level parsing info, you can use Tokenizer.
There is a limited set of VOID elements, so it is easy to serialize. http://www.whatwg.org/specs/web-apps/current-work/multipage/syntax.html#serialising-html-fragments |
Example of producing HTML from SAX events: var SAXParser = require('html5').SAXParser;
var HtmlSerializer = require('./HtmlSerializer').HtmlSerializer;
var outStream = require('fs').createWriteStream("out.html");
var parser = new SAXParser();
var serializer = new HtmlSerializer(outStream);
parser.contentHandler = parser.lexicalHandler = serializer;
parser.parse('...'); |
But how can I understand whether the tag is self closing? |
Just check it's name matches one of area, base, basefont, bgsound, br, col, embed, frame, hr, img, input, keygen, link, menuitem, meta, param, source, track or wbr element. |
But if someone is so bad person and want to parse an invalid input?
|
Thank you for the list of self closing text! |
According to spec, it will be interpreted as
See http://www.whatwg.org/specs/web-apps/current-work/multipage/syntax.html#serialising-html-fragments |
But I can try to parse this situation :
It will be for browser - In two cases I will receive the same, but two inputs were not the same. Maybe it is necessary to add a parameter into one of your contentHandler's method, it will be |
Parser will ignore
Yes, invalid markup will be repaired. I already said about it. Even valid input markup may not match serialized output. Could you explain how do you want to use parser? Probably you need another tool. |
For example, I want to check the validity of input or as in my case I want to compare to |
This parser is used in http://ace.c9.io/build/kitchen-sink.html (select HTML mode) for syntax checking. parser.errorHandler = {
error: function(message, location, code) {
// Parse error
}
};
Not sure what do you mean. I guess you have to write your own parser. |
Besides, what about this situation:
and
?
It seems, the
contentHandler
parses them just in the same way! Yes, they are identical for a browser, but in the point of view ofparsing
they are not identical, are they?The text was updated successfully, but these errors were encountered: