Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV export with Message Body #10

Open
halueda opened this issue Apr 18, 2019 · 10 comments
Open

CSV export with Message Body #10

halueda opened this issue Apr 18, 2019 · 10 comments

Comments

@halueda
Copy link
Contributor

halueda commented Apr 18, 2019

Currently only properties of messages are exported, but not message body.
UserEntryId is a part of the body but the length of contents are very limited.

It is very useful if Body and/or NativeBody are exported.

halueda added a commit to halueda/XstReader that referenced this issue Apr 18, 2019
Add Body and NativeBody to dictionary to export them.
@Dijji
Copy link
Owner

Dijji commented May 8, 2019

Generally, I agree that this would be useful. The problem is the format. Bodies come in three main formats: plain text; HTML; and RTF (Word document format). What should I export them as? I had a look around, but couldn't come up with anything compelling in the way of a common format.

Another consideration in the HTML case - what should I do with embedded graphics, which Outlook transports as hidden attachments?

Dijji

@halueda
Copy link
Contributor Author

halueda commented May 9, 2019

I think these three format is put in different column in CVS file.
About the format, I'm thinking

  • plain text: encoding to utf-8 text
  • HTML; it is rather difficult because originally it is binary data, byte[] . I propose that exporting two column:
    - utf-8 converted html, which is almost readable but sometimes incorrect encoding.
    - base64 format, which is perfect for mail reader but hard to read by human.
  • RTF: base64 text

IMHO , HTML embedded graphics should be ignored, because no attachments are exported.
And, I think it is acceptable for user because Outlook often shows HTML message without graphics, especially the sender unreliable.

@Dijji
Copy link
Owner

Dijji commented May 27, 2019

I'm not a big fan of adding a mix of columns to the CSV file, where some are meant for humans to read, and some for programs. I think we should stick to making the CSV file human readable, and if we want program readable formats, export them separately.

I also think that if we have a plain text column in the CSV file, then the program should make a decent effort at providing text whatever the email format. For plain text, this is trivial. For HTML, most browsers support a means of extracting text from a web page, although this might make extraction run rather slowly. For RTF, Microsoft Word would give a text conversion, but this would be an annoying dependency to take. Maybe there is an open source alternative for this.

Having got this far, it all sounds pretty tricky. But if I consider the alternative that you have coded up, I'm not sure what on earth would consume compressed RTF, or the streamed form of HTML. What do you believe the use cases for these formats are?

Dijji

@halueda
Copy link
Contributor Author

halueda commented Jun 3, 2019

My concern is importing all the messages from .ost file to Outlook. To do so, I can do following steps:

  • Export properties for messages in a folder as a CSV file.
  • Convert each messages in the CSV file, line by line, to a eml file
  • Import eml files into outlook.

I almost finished work for this and published in my repositories.
For first two items, exporting for this purpose and converting to eml format in Python, I pushed In a working branch of a fork of your XstReader, https://github.com/halueda/XstReader/tree/working
And for the final item, I pushed another repo, https://github.com/halueda/EML-Import , automatic import in Visual Basic Script.

At this end, I'm also agreeing CSV file is just for humans. Instead, exporting eml file should be more direct way. Though, I'm not familiar with C# so much, that I need more time to contribute that task.

I'm also noticed exporting properties is not best fit for my purpose because properties do not include To and Cc information. I realized that I need exporting 'contents' instead of 'properties', if exporting eml is too specific requirement.


About RTF contents, I also noticed that issue and worked out for it. Finally, I export it as an attached file with extension of rt. I can click the attached file then MS word preview plugin automatically show the contents, and it is enough to salvage message.

@Dijji
Copy link
Owner

Dijji commented Jun 5, 2019

Very interesting. I'll have a look at your code and see if I can figure it out, although your C# might be better than my Python!

Dijji

@Dijji
Copy link
Owner

Dijji commented Dec 3, 2019

I've been looking at your code, and I'm afraid I'm rather puzzled. So, the path is XstReader to CSV to eml.

  1. I understand how this will work for text content, but I am puzzling over HTML and Rich text format. eml, if it means anything, means RFC 822, but that only deals with text. How should full fidelity be maintained (and I do think that should include in-line images)?

  2. Why do I want to go to the eml format in the first place? Outlook no longer supports importing it, so why is it a good vehicle for recovering OST files? If you need a script to run in Outlook, why go via eml when you're having to deal with rich content by appending core body material as attachments.

  3. Would not the most direct way of attacking the load into Outlook problem be to export properties to a CSV, the body to a text, HTML or RTF file, and the attachments to their own files, add pointers to those files to the CSV, and then write an Outlook script that consumed as much of the data as possible?

It may just be that I'm missing your point. Please enlighten me if so.

Dijji

@halueda
Copy link
Contributor Author

halueda commented Dec 4, 2019

I choosed eml simply because I know the format very well, as I am very old man and knows well
about UNIX /usr/bin/mail and so on than Outlook format. If I knows more about Outlook format then I don't stick with eml format.
About fidelity, I implemented my extension referring other RFCs about attachment and so on, as much as I could.

As you mentioned that V1.7 has an extension to save a message with attached file in another issue, I feel it works. Anyway, I will check it whether it fit for me. (But at now, I'm so busy that it will be a few weeks/months later)

@esantose
Copy link

Hi, I need to get some MAPIproperties, but I co uld find a function to do it.
Could someone help me with how to get the 'EntryId" value as a string? (it's a byte[])

@Dijji
Copy link
Owner

Dijji commented Dec 13, 2020

Are you by any chance looking at the CSV file using Excel? There appears to be a problem with properties that contain a new line character messing up Excel CSV import in UTF-8 encoded files.

To confirm, export the properties for a message and look at the CSV file in a text editor. It is a bit hard to read, but if you count the headings to the first EntryId property, and then count to the corresponding data field, you should see the value in hexadecimal, just as you do if you display message properties in the UI.

Dijji added a commit that referenced this issue Dec 13, 2020
@Dijji
Copy link
Owner

Dijji commented Dec 14, 2020

Release version 1.13 contains a fix that will cause Excel to open exported CSV files correctly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants