Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple bugs in TnefPart.cs #538

Closed
KelvinKingGitHub opened this issue Feb 27, 2020 · 46 comments
Closed

Multiple bugs in TnefPart.cs #538

KelvinKingGitHub opened this issue Feb 27, 2020 · 46 comments
Labels
bug Something isn't working

Comments

@KelvinKingGitHub
Copy link

KelvinKingGitHub commented Feb 27, 2020

I've been working a lot lately with winmail.dat files and discovered multiple bugs in some of the methods in TnefPart.cs:

First function with bugs is ExtractRecipientTable (please see my comments inline):

   static void ExtractRecipientTable (TnefReader reader, MimeMessage message)
	    {
		var prop = reader.TnefPropertyReader;

		// Note: The RecipientTable uses rows of properties...
		while (prop.ReadNextRow ()) {
			InternetAddressList list = null; // There is a logic bug here where list is initialized to null every time reading a new row. 
However, the assumption is that there is a RecipientType property for each new row and the list is 
initialized accordingly. However, this is not the case in the Tnef files I'm processing. 
Hence, list is null most of the time and a new MailboxAddress is not created and added to the 
MimeMessage. The solution I made was to move the initialization of the list outside the while 
loop. 
			string name = null, addr = null;

			while (prop.ReadNextProperty ()) {
				switch (prop.PropertyTag.Id) {
				case TnefPropertyId.RecipientType:
					int recipientType = prop.ReadValueAsInt32 ();
					switch (recipientType) {
					case 1: list = message.To; break;
					case 2: list = message.Cc; break;
					case 3: list = message.Bcc; break;
					}
					break;
				case TnefPropertyId.TransmitableDisplayName:
					if (string.IsNullOrEmpty (name))
						name = prop.ReadValueAsString ();
					break;
				case TnefPropertyId.DisplayName:
					name = prop.ReadValueAsString ();
					break;
				case TnefPropertyId.EmailAddress:
					if (string.IsNullOrEmpty (addr))
						addr = prop.ReadValueAsString ();
					break;
				case TnefPropertyId.SmtpAddress:
					// The SmtpAddress, if it exists, should take precedence over the EmailAddress
					// (since the SmtpAddress is meant to be used in the RCPT TO command).
					addr = prop.ReadValueAsString ();
					break;
				}
			}
			if (list != null && !string.IsNullOrEmpty (addr))
				list.Add (new MailboxAddress (name, addr));
		}
	}`

Another bug in the implementation of ExtractRecipientTable which I have not been able to identify the cause yet is that it is unable to read correctly the RecipientType for both Cc and Bcc. I've also noticed that there are Null property values, this might have something to do with it... I'm still investigating.

Related to this bug is also another bug where prop.ReadNextRow () is unable to retrieve the actual following row in the recipient table. One fix which I discovered to work is to break out of the while loop immediately after the MailboxAddress is created:

  if (list != null && !string.IsNullOrEmpty (addr))
  {
	  list.Add (new MailboxAddress (name, addr));
              break;
  }

It seems like there is an issue of calling prop.ReadNextProperty() to process the rest of the remaining properties in the row...

Another issue which I noticed was after this function is called, the AttributeTag get's corrupted. In the case of the function ExtractTnefMessage. Please see my commends in-line.

            static MimeMessage ExtractTnefMessage (TnefReader reader)
	{
		var builder = new BodyBuilder ();
		var message = new MimeMessage ();

		message.Headers.Remove (HeaderId.Date);

		while (reader.ReadNextAttribute ()) {
			if (reader.AttributeLevel == TnefAttributeLevel.Attachment)
				break;

			var prop = reader.TnefPropertyReader;

			switch (reader.AttributeTag) {
			case TnefAttributeTag.RecipientTable:
				ExtractRecipientTable (reader, message);
    // After call to ExtractRecipientTable, reader.AttributeTag is corrupted. And hence 
 ReadNextAttribute doesn't contain the MapiProperties Attribute any more. 
				break;
			case TnefAttributeTag.MapiProperties:
				ExtractMapiProperties (reader, message, builder); // This function does not get executed at all. 
				break;
			case TnefAttributeTag.DateSent:
				message.Date = prop.ReadValueAsDateTime ();
				break;
			case TnefAttributeTag.Body:
				builder.TextBody = prop.ReadValueAsString ();
				break;
			}
		}

		if (reader.AttributeLevel == TnefAttributeLevel.Attachment)
			ExtractAttachments (reader, builder);

		message.Body = builder.ToMessageBody ();

		return message;
	}`

Platform (please complete the following information):

  • OS: Windows
  • .NET Framework: .NET 4.5
  • MimeKit Version: 2.5.1
@KelvinKingGitHub
Copy link
Author

I somehow managed to stumble onto a solution to earlier issue where I wasn't getting the RecipientType for every row in the Recipient Table. Somehow, when I included a case for RowId and read the property, I was able to retrieve each row correctly and get the corresponding RecipientType. So now, code looks like this for ExtractRecipientTable:

static void ExtractRecipientTable (TnefReader reader, MimeMessage message)
	{
		var prop = reader.TnefPropertyReader;
                    int rowid;
		// Note: The RecipientTable uses rows of properties...
		while (prop.ReadNextRow ()) {
			InternetAddressList list = null;
			string name = null, addr = null;

			while (prop.ReadNextProperty ()) {
				switch (prop.PropertyTag.Id) {
				case TnefPropertyId.RecipientType:
					int recipientType = prop.ReadValueAsInt32 ();
					switch (recipientType) {
					case 1: list = message.To; break;
					case 2: list = message.Cc; break;
					case 3: list = message.Bcc; break;
					}
					break;
				case TnefPropertyId.TransmitableDisplayName:
					if (string.IsNullOrEmpty (name))
						name = prop.ReadValueAsString ();
					break;
				case TnefPropertyId.DisplayName:
					name = prop.ReadValueAsString ();
					break;
				case TnefPropertyId.EmailAddress:
					if (string.IsNullOrEmpty (addr))
						addr = prop.ReadValueAsString ();
					break;
				case TnefPropertyId.SmtpAddress:
					// The SmtpAddress, if it exists, should take precedence over the EmailAddress
					// (since the SmtpAddress is meant to be used in the RCPT TO command).
					addr = prop.ReadValueAsString ();
					break;
                                case TnefPropertyId.Rowid: 
                                      // Why this works, I don't know...
                                      rowid = prop.ReadValueAsInt32();
                                      break;
				}
			}

			if (list != null && !string.IsNullOrEmpty (addr))
				list.Add (new MailboxAddress (name, addr));
		}
	} 

jstedfast added a commit that referenced this issue Feb 27, 2020
@jstedfast
Copy link
Owner

The reason adding prop.ReadValueAsInt32() works is because there is presumably a bug in prop.ReadNextProperty () where it doesn't advance far enough or advances too far or something.

TNEF is, unfortunately, very difficult to debug w/o test cases which are hard to come across :(

I've committed the RowId fix. Did that solve all of your issues? It sounds like that also fixed your MapiProperties issue, right? Since presumably the corruption issue is gone now.

@KelvinKingGitHub
Copy link
Author

KelvinKingGitHub commented Feb 28, 2020

Hi Jeff,

That fixed the issue with reading every row in the recipient table correctly. However, it still didn't fix the issue where the MapiProperties was corrupted after that... As a work around, I'm calling both ExtractRecipientTable and ExtractMapiProperties seperately (i.e. I initialize a new TnefReader for each operation), it seems like that is the only reliable way for now... Reusing the TnefReader seems to be unreliable. Is there anyway to do a "reset" on the reader?

I've encountered another issue with trying to extract the sender email address from the MapiProperties. When I print out the SenderEmailAddress property, I do not get a string that contains a valid email address. My output is like this:

/O=ORG/OU=EXCHANGE (ABCD1234)/CN=RECIPIENTS/CN=1234567890-SOME NAME

Looking at the file in a hex editor, I'm actually able to see there is a valid email address somewhere below the Body property, but I can't identify which property that it.

On the same line as the sender email address I see some text with the words "Client=MSExchangeRPC". Would you be able to identify which property that is?

Thanks!

@KelvinKingGitHub
Copy link
Author

Update:

I noticed that some properties were not recognized (or perhaps also corrupted?) so I did a default case in the switch statement to print out the raw form of the property:

default: 
  byte[] rawValue = new byte[prop.RawValueLength];
  prop.ReadRawValue(rawValue, 0, rawValue.Length);
  printRawValueToString(rawValue); // just a simple printing function using 
  // Encoding.Default.GetString()
  break;

This is what I discovered:

Property: 23818
[email protected]
Property: 23819
[email protected] (the property I'm looking for...)
...
Property: -32742
Client=MSExchangeRPC (my place marker)

Do you think this is an issue of corrupted properties? Or just some properties that are not defined in MimeKit?

Thanks!

@jstedfast
Copy link
Owner

The /O=ORG/... is some sort of X.400 identifier, I think. My understanding is that Exchange (and maybe Outlook) were originally built on top of the X.400 standard and not the IETF messaging standards (e.g. MIME etc), so they didn't originally use what we think of as email addresses now-a-days.

Of course, I could be wrong in my understanding, so feel free to fact check me on that ;-)

It looks like I don't have a Reset() method on TnefReader, but perhaps I could add one... that said, ideally you wouldn't have to reset, so maybe I'll hold off on adding that until we can figure out the real bug you are hitting.

Is there any chance you could send me the winmail.dat file(s) that you are struggling with?

Ideally they'd be suitable for me to add to my test suite to prevent bugs like this from cropping up in the future, but I understand if you need them to stay private. I also understand if there's no way you'd even allow me to see them.

Back to your problem, though, based on the API that I cloned, it appeared that some properties can have multi-values. Perhaps that is what you are encountering? I don't think I ever managed to find an example winmail.dat file with such a case, though.

All that said, it would be interesting if you could figure out if MimeKit's TnefReader/TnefPropertyReader were reading beyond the end of a property or whether it wasn't reading enough - either way would cause the next property to seem corrupted to the reader.

What you could try doing us printing out the TnefReader.StreamOffset at the start of each attribute tag.

Console.WriteLine("Before reading any TnefAttributeTags, offset = {0}", reader.StreamOffset);
while (reader.ReadNextAttribute ()) {
	if (reader.AttributeLevel == TnefAttributeLevel.Attachment)
		break;

	var prop = reader.TnefPropertyReader;

	Console.WriteLine("TnefAttributeTag.{0} properties start at offset {1}", reader.AttributeTag, reader.StreamOffset);
	switch (reader.AttributeTag) {
	case TnefAttributeTag.RecipientTable:
		ExtractRecipientTable (reader, message);
		break;
	case TnefAttributeTag.MapiProperties:
		ExtractMapiProperties (reader, message, builder);
		break;
	case TnefAttributeTag.DateSent:
		message.Date = prop.ReadValueAsDateTime ();
		break;
	case TnefAttributeTag.Body:
		builder.TextBody = prop.ReadValueAsString ();
		break;
	}
	Console.WriteLine("TnefAttributeTag.{0} properties end at offset {1}", reader.AttributeTag, reader.StreamOffset);
}

My guess is that in your first pass, immediately after RecipientTable, you'll get a nonsensical TnefAttributeTag (due to reading from a mis-asligned stream offset).

I want to compare those offsets in the output from the above loop to the offsets of your second pass to see if RecipientTable is over-reading or under-reading.

@jstedfast
Copy link
Owner

Regarding your update, it could be that the properties are corrupted due to a mis-aligned stream read as well, that would not be surprising.

It's also possible that those property values are valid IDs that I just don't have a mapping for.

Most of the mappings I've found have been through docs and looking at other tnef readers in java, c, etc - none of which had a complete set. Each project seemed to have a different subset of TNEF property identifiers (with quite a bit of overlap, but mostly only the core properties).

@jstedfast
Copy link
Owner

Heading to bed for now, but if you've got any winmail.dat samples you can send me, that would help tremendously.

@LeeBear35
Copy link

I have been reading and you are encountering x.400 addressing, which has been an internal addressing used by Exchange. What you may find is that there are no IETF addresses on the email if it did not have to bridge from the internal Exchange network to an external mail host.

You will also find that the messages or more often the message stores suffer corruption and post a problem.

Just my two cents

@jstedfast
Copy link
Owner

@KelvinKingGitHub any news? I'll have a bunch of free time on Sunday to work on this if you've got some sample files to send me.

@KelvinKingGitHub
Copy link
Author

KelvinKingGitHub commented Mar 1, 2020

@jstedfast I sent an email to your Microsoft address. Did you receive it? I wasn’t sure where else to send.

@jstedfast
Copy link
Owner

@KelvinKingGitHub when did you send it? I don't seem to have it but if it was a few days ago, I might have emptied my Spam folder without realizing your email was mistakenly sitting in there (whatever spam filtering rules Microsoft uses, it seems to filter legit messages into my Spam folder - I usually try to go thru it every few days and catch those messages, but I may have missed yours?)

@KelvinKingGitHub
Copy link
Author

@jstedfast I sent it on Fri I think... No worries, I'll send you another email now.

@jstedfast
Copy link
Owner

@KelvinKingGitHub ok, I'll keep an eye out for it.

@jstedfast
Copy link
Owner

BTW, make sure to spell my email alias correctly - it's [email protected] (note that it's not [email protected] which it would seem like it should be).

The default rules for creating a Microsoft email alias seems to be first 2 characters of a person's first name and the first 6 characters of their last name.

@KelvinKingGitHub
Copy link
Author

@jstedfast Ok, I just sent the email I was just trying to recall what I said in my previous note. Let me know if you received it, or if your spam filter removed the attachment.

@jstedfast
Copy link
Owner

@KelvinKingGitHub I think I just saw your message in my Inbox but as soon as I clicked it in Outlook (for Mac), it disappeared. I wonder if Outlook auto-decoded the winmail.dat or something into whatever the item was and added it to my calendar/tasks/something?

@KelvinKingGitHub
Copy link
Author

@jstedfast Would zipping it up work ?

@jstedfast
Copy link
Owner

Yea, that might be the way to go - I can't find it anywhere in Outlook anymore. I wonder if this is what happened to your email sent on Friday as well.

@KelvinKingGitHub
Copy link
Author

@jstedfast Possible... I noticed that each mail client has different ways of handling winmail.dats. Did it work now ?

@jstedfast
Copy link
Owner

Yep, I've got it now! Thanks! I'll take a look into this today and see if I can figure out the problem(s).

@KelvinKingGitHub
Copy link
Author

Cheers!

@jstedfast
Copy link
Owner

Okay, so I did find/fix some bugs in the TnefPropertyReader logic, but I don't think it has made much of a difference.

I didn't find these properties, though:

Property: 23818
[email protected]
Property: 23819
[email protected] (the property I'm looking for...)
...
Property: -32742
Client=MSExchangeRPC (my place marker)

However, I suspect those were probably in another winmail.dat file and not the one you sent me as a test case.

I did find the following properties that I thought were interesting:

Recipient Table:

  • 24566 (0x00005FF6) => seems to contain a "display name" for your email address (or, probably more accurately, the "recipient address" since you sent this to yourself and it's in the Recipient Table).

MAPI Properties Table:

  • 16378 (0x00003FFA) => contains your email address so probably maps to the sender email address in "SMTP" form rather than the "Exchange" form (or X.400 form). In fact, the AddrType property had a value of "EX" so I'm going to assume that's short for "Exchange" :-)
  • -32589 => contains "16.0" which is probably the Outlook version?

I tried searching for 0x00003FFA in https://interoperability.blob.core.windows.net/files/MS-OXPROPS/%5bMS-OXPROPS%5d-190319.pdf and found that 16378 (aka 0x3FFA) seems to map to PidTagLastModifierName.

Based on the same document, 24566 (aka 0x5FF6) seems to map to PidTagRecipientDisplayName, so I'll add a mapping for this one.

@jstedfast
Copy link
Owner

23818 (0x5D0A) and 23819 (0x5D0B) don't seem to map to anything, but 23809 (0x5D01) would map to SenderSmtpAddress which would be exactly what we want.

jstedfast added a commit that referenced this issue Mar 1, 2020
When advancing to the next row or property, make sure to properly
skip over the remaining properties or values.

Partial fix for issue #538
@jstedfast
Copy link
Owner

The above fix should address the need to read the Rowid property value and I added the mapping for PR_RECIPIENT_DISPLAY_NAME.

@KelvinKingGitHub
Copy link
Author

KelvinKingGitHub commented Mar 2, 2020

Thanks Jeff. You are correct, the Property: 23818 and 23819 tags were in another email. I was unable to reproduce that in the sample However, 16378 does consistently appear in all my winmail.dat files. So perhaps that's more reliable.

23809 doesn't seem to appear in any of my winmail.dat files. So unfortunately, I can't use that...

I'll pull the latest code and test it out. Did it also resolve the issue where the TnefAttributeTag was getting corrupted after reading the Recipient Table?

@KelvinKingGitHub
Copy link
Author

Hi Jeff,

Regarding the issue with the TnefAttributeTag - it's my bad... I was simply doing a ReadNextAttribute() within my function, so essentially I was skipping the Attribute altogether.

@jstedfast
Copy link
Owner

@KelvinKingGitHub ah, okay, thanks for that update. I was not able to reproduce the TnefAttributeTag corruption bug you had mentioned but thought that maybe it was only a problem with some of the other test cases you had.

I added an enum for the 16378 property (TnefPropertyId.LastModifierName), so you can at least grab that now. That said, based on the docs, that's just the "name" of whoever last modified the winmail.dat content. It seems likely that in most cases that will be the creator, but the way the docs read, what the name value can be expected to be is vague and doesn't seem like it's expected to necessarily be an email address.

@jstedfast
Copy link
Owner

BTW, as you continue to work through your TNEF samples, if you find any property id's that you'd like me to add mappings for, let me know and I'll add them.

One thing I noticed in the docs is that there is an SenderSmtpAddress property that can be stored in an AttributeTag. I wonder if any of your samples have any AttributeTags other than the RecipientTable and the MapiProperties?

@KelvinKingGitHub
Copy link
Author

Thanks Jeff. I don't recall seeing SenderSmtpAddress attribute tag in any of my winmail.dat files. But I'll do another check and update you on that.

In the sample I sent you, were you able to extract the attachment within it? I mentioned in my email that I think there might be a bug in the ExtractAttachments also...

@jstedfast
Copy link
Owner

@KelvinKingGitHub I don't have a mapping for the SenderSmtpAddress AttributeTag, so I think it'll appear as an unknown AttributeTag. I was just wondering if any of your samples had an Attribute tag other than RecipientTable and MapiProperties.

Hope that clears up any confusion.

What was the attachment in the winmail.dat that you sent me? I was able to extract a plain-text body and some RTF data, but didn't see anything else.

@KelvinKingGitHub
Copy link
Author

The attachment was a msg that was inline in the body of the email. It appears as an EmbeddedMeasage in the AttachMethod properties. There might be a bug that is preventing it from being extracted correctly.

@jstedfast
Copy link
Owner

Ah, I'll try to take a look tonight. I don't think I had any samples with an EmbeddedMessage before so I probably have some bugs there...

jstedfast added a commit that referenced this issue Mar 3, 2020
jstedfast added a commit that referenced this issue Mar 3, 2020
@jstedfast
Copy link
Owner

I thought I committed the PidTagLastModifierName patch the other day, but just noticed that I hadn't

jstedfast added a commit that referenced this issue Mar 5, 2020
Partial fix for issue #538
@jstedfast
Copy link
Owner

Added mappings for some PR_ATTACHMENT_XXXX property tags.

I also stepped thru the current code and it is extracting an "embedded message" attachment but it was returning a regular MimePart instead of a TnefPart because he AttachMethod property doesn't appear until after getting the AttachData.

I'm cooking up a fix for that as well.

Is that the last bug that you are aware of?

jstedfast added a commit that referenced this issue Mar 5, 2020
Sometimes AttachData can become before AttachMethod, so properly
handle this case as well.

Another partial fix for issue #538
@KelvinKingGitHub
Copy link
Author

Hi Jeff,

Yup, that's the bug I was referring to. In my case, the AttachMethod comes after AttachData. So the logic couldn't handle this situation... I fixed it on my side by basically duplicating the code into the AttachData switch case as well, and it seemed to work...

@KelvinKingGitHub
Copy link
Author

However, now this line: builder.Attachments.Add (attachment); will have a logic error.

Because it appears now 2 times in both the case: AttachData and case: AttachMethod, the TnefPart will get added twice to the attachments list. Once when it's treated as a "normal" attachment, and second time when it's recognized as an EmbeddedMessage...

@jstedfast
Copy link
Owner

Right, all I did in my fix that I just committed was to remove the MimePart attachment, convert it into a TnefPart, and then re-add it.

@jstedfast
Copy link
Owner

Can you test out my fixes and see if that solves all of your issues? I'm about to head to bed, but I'll read your comments in the morning and see if I can squeeze in some more fixes tomorrow if they are needed.

@KelvinKingGitHub
Copy link
Author

Thanks Jeff, I'll test it out.

I also found that it was necessary to remove the first 16 bytes of the attachData when setting the TnefPart content. Here's my code:

var stream = new MemoryStream();
stream.Write(attachData, 16, attachData.Length-16);
stream.Position=0;
filter.Flush(attachData, 16, attachData.Length-16, out outIndex, out outLength);
tnefPart.ContentTransferEncoding = filter.GetBestEncoding (EncodingConstraint.SevenBit);
tnefPart.Content = new MimeContent(stream);
builder.Attachments.Add(tnefPart);

@jstedfast
Copy link
Owner

Ah, interesting... that might make sense because it's probably some type of "object" (where those first 16 bytes may be a GUID or something) when the type is a EmbeddedMessage.

Thanks for that info, I didn't even think of checking that.

@KelvinKingGitHub
Copy link
Author

I got that clue because you were doing it in the GetEmbeddedMessageReader() from TnefPropertyReader.cs. So it kinda made sense for me to do that same for the AttachMethod code as well.

jstedfast added a commit that referenced this issue Mar 6, 2020
@jstedfast
Copy link
Owner

BTW, would it be ok for me to include the tnef file you emailed me in the unit tests?

@KelvinKingGitHub
Copy link
Author

Sure. As long as the winmail.dat file itself is not made public.

@jstedfast
Copy link
Owner

jstedfast commented Mar 7, 2020

Ah, that's the file that I was asking about ;-)

Don't worry, it's not a big deal. Most of the code changes are covered by existing files already.

I actually discovered that fixing these bugs exposed a bug in my unit tests for a file I created myself using Outlook to create some calendar events I guess. I forgot I did that. Hah!

@jstedfast jstedfast added the bug Something isn't working label Mar 31, 2020
@no-identd
Copy link

@jstedfast I just ran into this old ticket while googling around for some .CFG form inanity in Outlook and I figured it might interest you that Microsoft has explained what these two props do over here:

https://social.microsoft.com/Forums/mvpforum/en-SG/8e15ac6d-0404-41c0-9af7-26a06ca797bf/meaning-of-mapi-identifiers-0x5d0a-and-0x5d0b?forum=os_exchangeprotocols

HEX 0x5D0A (DEC 23818) = Creator SMTP Address
HEX 0x5D0B (DEC 23819) = Last Modifier SMTP Address

@jstedfast
Copy link
Owner

Thanks, I just added those and the PR_SENDER_SMTP_ADDRESS to MimeKit's TnefPropertyId enum.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants