Issue with saving XML attachments #228

it-can · 2017-10-05T14:34:14Z

Version: 1.0.1

I had an error after issue #226, seems to be related to encoding with "us-ascii"

@Slamdunk These are my email headers btw

------=_NextPart_000_01B0_01D33D5D.1AB50480
Content-Type: text/html; charset="utf-8"
Content-Transfer-Encoding: 7bit
Content-Description: body

------=_NextPart_000_01B0_01D33D5D.1AB50480
Content-Type: text/xml; name="test.xml"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="test.xml"

------=_NextPart_000_01B0_01D33D5D.1AB50480
Content-Type: application/octet-stream; name="test.pdf"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="test.pdf"

When saving the attachment, it seems the encoding is screwed of the XML file... The PDF seems correct...

ï»¿<?xml version="1.0" encoding="utf-8"?>

The text was updated successfully, but these errors were encountered:

Slamdunk · 2017-10-05T14:38:55Z

Hi, ï»¿ is the Byte order mark and it is correct you see it if the XML was created with the BOM and you open the file with a software that reads it in ASCII charset.

BOM always generated errors in most softwares and the modern pratice is to create UTF-8 documents without it. Still, if the BOM is there, you need to handle it by yourself: the decoding of the attachment of this library is correct.

it-can · 2017-10-05T14:43:03Z

Ok it worked in version 0.5.2

Slamdunk · 2017-10-05T14:50:56Z

I don't understand your last message: did it behave different in 0.5.2?

By the way the base64 encoding of the BOM is 77u/: if the first four chars of the first line of the attachment in the raw message are 77u/ you are dealing with an XML with the BOM.

it-can · 2017-10-05T14:51:47Z

Well I switched today to version 1 of this library, and now I have this issue... It worked last night correctly...

Slamdunk · 2017-10-05T14:54:26Z

If you can provide me the full raw message with sensitive informations obscured I will be happy to inspect the change and publish the eventual fix.

it-can · 2017-10-05T15:04:17Z

Should an XML attachment be passed to Transcoder::decode ? I think this is the problem?

https://github.com/ddeboer/imap/blob/master/src/Message/AbstractPart.php#L280

Slamdunk · 2017-10-05T15:07:34Z

An attachment never needs a charset decoding, since it's (almost) always sent encoded in Base64.
Even in version 0.5.2 attachment were never charset-decoded.

I'm sorry but I can't help you without the original mail that is causing the output you consider errored.

it-can · 2017-10-05T15:09:04Z

Yeah so I think an XML attachment will have a type of TEXT, and that is passed to the transcoder, if the mail was sent with Content-Type: application/octet-stream; name="test.xml" it works correctly (because it is not passed to the transcoder)

I can't send you the email because it is very sensitive to our business...

This works for me now:

$content = $attachment->getContent();

if (AbstractPart::ENCODING_BASE64 === $attachment->getEncoding()) {
    $content = base64_decode($content);
} elseif (AbstractPart::ENCODING_QUOTED_PRINTABLE === $attachment->getEncoding()) {
    $content = quoted_printable_decode($content);
}

Slamdunk · 2017-10-05T15:24:12Z

I was wrong: encoded attachment are charset-decoded if they are a text type like an XML.

The issue is that we consider the default server charset as us-ascii. Previous version did some guessing and in your case found the right charset: this is not an acceptable behaviour anymore because it's very brittle.

I need to do further investigation (in the next days).

it-can · 2017-10-05T15:25:48Z

Thanks for the help! I now have a quick "fix" for my issue, for now it works for me... I will keep a close eye on this! Thanks again!

Slamdunk · 2017-10-06T08:58:37Z

I have bad news about this issue.

This bug affects only attachments that have a text mime type which are not plain text, like HTML, XML, CSV.
This is not a bug of this library or of the IMAP server: it appears while creating the email, on the client side before it sends the email to the SMTP server

An example: two XML files with the same content composed by charset-specific chars like €, but encoded in different charset, the first in US-ASCII and the second in UTF-8.

If we compose the email in Thunderbird 52 with both the attachments, the receiver gets:

--------------9E93C7BA2D80D3B544BCD1A5
Content-Type: text/xml;
 name="att-utf8-xml.xml"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
 filename="att-utf8-xml.xml"

eG1sOiBBX1x8ISLCoyQlJigpPT/DoDw+LUAjJ3t9W11fw59f4oKsX1o=
--------------9E93C7BA2D80D3B544BCD1A5
Content-Type: text/xml;
 name="att-ascii-xml.xml"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
 filename="att-ascii-xml.xml"

eG1sOiBBX1x8ISI/JCUmKCk9Pz88Pi1AIyd7fVtdXz9fP19a
--------------9E93C7BA2D80D3B544BCD1A5

You can see that in the Content-Type header the charset is missing. In every way you try to charset-decode the content, one attachment will always be decoded wrong because at the starting point, during the email composition, the charset was not declared.

Gmail is smarter: it tries to detect the charset of the attachment and declare it:

--001a113d736c26ee77055adcc524
Content-Type: text/xml; charset="UTF-8"; name="att-utf8-xml.xml"
Content-Disposition: attachment; filename="att-utf8-xml.xml"
Content-Transfer-Encoding: base64

eG1sOiBBX1x8ISLCoyQlJigpPT/DoDw+LUAjJ3t9W11fw59f4oKsX1o=
--001a113d736c26ee77055adcc524
Content-Type: text/xml; charset="US-ASCII"; name="att-ascii-xml.xml"
Content-Disposition: attachment; filename="att-ascii-xml.xml"
Content-Transfer-Encoding: base64

eG1sOiBBX1x8ISI/JCUmKCk9Pz88Pi1AIyd7fVtdXz9fP19a
--001a113d736c26ee77055adcc524

After receiving this email, we can safely charset-decode both attachment the right way.

The fix I pushed in #227 introduce the default behaviour of the most email clients.

At the time of writing I don't see a robust solution to this issue 🙍

it-can · 2017-10-06T11:27:03Z

Ok thanks for the explanation... I will have to create a workaround for my tool...

Slamdunk · 2017-10-06T13:30:42Z

There is no way to solve this: as is we now completely avoid charset-decoding of attachments.

@it-can I would appreciate a lot your feedback of the new release 1.0.2

Slamdunk added the bug label Oct 5, 2017

it-can closed this as completed Oct 6, 2017

Slamdunk mentioned this issue Oct 6, 2017

Do not charset-decode attachments #231

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with saving XML attachments #228

Issue with saving XML attachments #228

it-can commented Oct 5, 2017 •

edited

Loading

Slamdunk commented Oct 5, 2017 •

edited

Loading

it-can commented Oct 5, 2017

Slamdunk commented Oct 5, 2017

it-can commented Oct 5, 2017

Slamdunk commented Oct 5, 2017

it-can commented Oct 5, 2017

Slamdunk commented Oct 5, 2017

it-can commented Oct 5, 2017 •

edited

Loading

Slamdunk commented Oct 5, 2017

it-can commented Oct 5, 2017

Slamdunk commented Oct 6, 2017 •

edited

Loading

it-can commented Oct 6, 2017

Slamdunk commented Oct 6, 2017

Issue with saving XML attachments #228

Issue with saving XML attachments #228

Comments

it-can commented Oct 5, 2017 • edited Loading

Slamdunk commented Oct 5, 2017 • edited Loading

it-can commented Oct 5, 2017

Slamdunk commented Oct 5, 2017

it-can commented Oct 5, 2017

Slamdunk commented Oct 5, 2017

it-can commented Oct 5, 2017

Slamdunk commented Oct 5, 2017

it-can commented Oct 5, 2017 • edited Loading

Slamdunk commented Oct 5, 2017

it-can commented Oct 5, 2017

Slamdunk commented Oct 6, 2017 • edited Loading

it-can commented Oct 6, 2017

Slamdunk commented Oct 6, 2017

it-can commented Oct 5, 2017 •

edited

Loading

Slamdunk commented Oct 5, 2017 •

edited

Loading

it-can commented Oct 5, 2017 •

edited

Loading

Slamdunk commented Oct 6, 2017 •

edited

Loading