-
-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTML entity for carriage return is being encoded #808
Comments
How are you converting? If you could write one up, a simple test-case program would be ideal so I can easily reproduce this to see what is going wrong. My gut instinct is that the FWIW, if you are doing this manually and the above diagnosis sounds correct, then you want to use |
Here's a small test program to reproduce the bug: HTMLEntityTest.zip I hope this helps! |
Okay, so the problem is that according to the W3 HTML5 specification, is a parse error. Note here: https://dev.w3.org/html5/spec-LC/tokenization.html#consume-a-character-reference
When MimeKit's HtmlTokenizer encounters a parse error, it just emits the raw entity instead of decoding it and when the HtmlToHtml converter writes out the tokens it gets, the token re-encodes the cdata, therefore creating this issue. I think the solution is to tell the tokenizer not to decode character references which will resolve this. |
MimeKit v3.4.0 has been released with this fix. |
Describe the bug
When you convert a mimetext containing it will be encoded an additional time, resulting in the output being 
Other html entities seem to not be affected.
Platform (please complete the following information):
To Reproduce
Convert the following mimetext to html:
This will result in the following html:
Expected behavior
Output should be
The text was updated successfully, but these errors were encountered: