Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some combinations encoding/charset aren't supported #848

Closed
oren-boop opened this issue Oct 21, 2022 · 4 comments
Closed

some combinations encoding/charset aren't supported #848

oren-boop opened this issue Oct 21, 2022 · 4 comments

Comments

@oren-boop
Copy link

oren-boop commented Oct 21, 2022

Describe the bug
there is an issue when I try to replace the a text part's text.
using:
`

                    textPart.TryDetectEncoding(out var encoding, out var confidence);
                    string text = textPart.GetText(encoding);
                    textPart.SetText(encoding, text);

`

the text string being read looks fine, but after setting it back into the text part the text part's text becomes garbled in some cases.

I've created a sample project with 2 examples. the first one works, and the 2nd fails.
the sample writes to the output folder the original message and the message after processing.

btw - using
textPart.SetText("iso-2022-jp", text);
will work on the 2nd sample, but I have no way of obtaining the charset. this info isn't available anywhere in the TextPart, not even in textPart.ContentType.MimeType or the headers of the ContentType -- it just shows "text/html" without the charset info.

Platform (please complete the following information):

  • OS: windows
  • .NET Runtime: [e.g. CoreCLR, Mono]
  • .NET Framework: 4.7.2
  • MimeKit Version: 3.4.1

To Reproduce
Steps to reproduce the behavior:

  1. Open the attached project
  2. modify the project settings output folder to anywhere (I use D:)
  3. run the FrameworkTest
  4. see the output of file 2

Expected behavior
no <?> in rendered MHT

Code Snippets
If applicable, add code snippets to help explain your problem.

// Add your code snippet here.

Additional context
MimeKitTest.zip

Thanks,
Oren

@jstedfast
Copy link
Owner

Looks okay to me?

btw - using
textPart.SetText("iso-2022-jp", text);
will work on the 2nd sample, but I have no way of obtaining the charset. this info isn't available anywhere in the TextPart,

Which parts of the second message are you talking about?

not even in textPart.ContentType.MimeType or the headers of the ContentType -- it just shows "text/html" without the charset info.

The textPart.ContentType.MimeType isn't supposed to have the charset parameter. That only contains the mime-type information.

I ran your test program and it correctly detected the charset of every text part (other than the text/css parts) correctly as UTF-8 or Shift_JIS and then you set the text/charset value and the textPart updated with the new charset infiormation.

I don't understand what the issue is.

The only parts in the FrameworkTest.2-modified.mhtml file that doen't have charset= parameters in the Content-Type headers are the ones that would have been set to UTF-8 if your code didn't ignore those parts:

                        if (encoding == Encoding.UTF8)
                        {
                            //Console.WriteLine("Ignoring UTF8 Part");
                            break;
                        }

@oren-boop
Copy link
Author

oren-boop commented Oct 23, 2022

This is what example 2 looks like (on edge):
before:
image

after:
image

@jstedfast
Copy link
Owner

That's because iso-2022-jp is treated different by the browser than Shift_JIS. If you change Content-Type: text/html; charset=iso-2022-jp to Content-Type: text/html; charset=Shift_JIS, it works.

iso-2022-jp can map to codepage 50220 or 50222 which use different techniques for encoding.

I can't seem to paste a screenshot of the System.Text.Encoding properties, but if you set a breakpoint in your program and inspect the Encoding you'll see that it has the following values:

BodyName: iso-2022-jp
HeaderName: iso-2022-jp
WebName: shift_jis

I can make MimeKit override the HeaderName to use shift_jis, but this is the cause of the problem.

jstedfast added a commit that referenced this issue Oct 23, 2022
@oren-boop
Copy link
Author

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants