Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TNEF: Save original codepage for future reference #357

Closed
wants to merge 1 commit into from

Conversation

andrvo
Copy link

@andrvo andrvo commented Nov 17, 2017

Office 365 sends quite odd TNEF messages. It often sets OemCodepage attribute to 1251 and specifies real charset of HTML body in meta tag.
It is not a problem of MimeKit of course, but life can be much easier if ConvertToMessage() method would indicate codepage value it used to extract particular binary property (HTML body of the message).

@jstedfast
Copy link
Owner

What do you need the charset for? The message body? If so, adding it as a message header is the wrong way to go about solving this. It would be better to specify it directly on the individual MIME part.

@andrvo
Copy link
Author

andrvo commented Nov 17, 2017

It's a bit more complicated.

If I try to parse TNEF message from Office 365 written in Ukrainian, I see the following:

MimeKit extracts TNEF body HTML using ReadValueAsString() method (TnefPart.cs, Ln:171). Value is Binary, so ReadString() is called. And DecodeAnsiString() inside uses reader.MessageCodepage value to convert raw bytes to string.

That is correct, exactly as TNEF specification requires.

And once I present this HTML to the user, he can see garbage, Cyrillic symbols are broken.

Ok, I open HTML and look inside. It has meta tag inside, with different charset.
<meta http-equiv="Content-Type" content="text/html; charset=koi8-u">

So I have to do the following (tp is TextPart):

var transEncoding = Encoding.GetEncoding("koi8-u");

int tnefCodePage = 1252;
if (tnefMsg.Headers.Contains("X-OriginalCodepage"))
{
    int.TryParse(tnefMsg.Headers["X-OriginalCodepage"], out tnefCodePage);
}

var tnefEncoding = Encoding.GetEncoding(tnefCodePage);

var srcBytes = tnefEncoding.GetBytes(tp.Text);
string transcoded = Encoding.UTF8.GetString(Encoding.Convert(transEncoding, Encoding.UTF8, srcBytes));

After that transcoded string contains correct HTML that can be presented to the recipient. Similar trick must be performed for Danish and Hebrew. So I conclude it is a feature of Office 365 as both Exchange and Outlook know how to deal with it.

And if I understand the specification correctly, OemCodepage attribute is message-wide, in terms of TNEF.

@jstedfast
Copy link
Owner

Do you think it would be possible to create one of these tnef attachments (with safe to publish publicly content) so that I can add it to my unit tests as well as playing around with it to try and find a nice solution?

I'm thinking the nicest solution, assuming I can both make it work and if it makes sense (which it sounds like it does?), is to automatically tag the TextPart with the OemCharset encoding for you, so that when you get the .Text property, it's already converted for you.

@jstedfast
Copy link
Owner

jstedfast commented Nov 18, 2017

Can you try the patch that I just committed above?

That will set the TextPart.ContentType.Charset property.

Also note that since a TextPart subclasses MimePart, you can also access the content of the TextPart using the ContentObject property which will give you a stream.

Currently the stream used with the IContentObject is a MemoryStream (I suspect that I will continue to use this type of stream since it is likely to remain the most efficient stream type for this, but I don't want to guarantee it, so just be careful). With MemoryStream you can use GetBuffer() and/or ToArray() to get the raw bytes in order to avoid needing to bypass the need to tnefEncoding.GetBytes() call.

@jstedfast
Copy link
Owner

You could also look into using my MimeKit.IO.Filters.CharsetFilter to convert between charsets using a stream interface by attaching that filter to a MimeKit.IO.FilteredStream.

Once you start playing with my FilteredStream and the various filters I've written, you will fall in love. They are very addicting ;-)

@andrvo
Copy link
Author

andrvo commented Nov 18, 2017

Patch looks good, thanks! I didn't try it yet, but certainly it will work in my case. So far I'm working on concept of solution, performance issues coming a bit later. But they will come :)

I'll try to get some test message on Monday.

@jstedfast
Copy link
Owner

Ok, cool. I'll close this as fixed then. If you can get me a sample tnef attachment that I can use for testing, that would be awesome. Feel free to send that to [email protected]

@jstedfast jstedfast closed this Nov 18, 2017
jstedfast added a commit that referenced this pull request Nov 20, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants