Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

From returns blank when it contains special characters #1043

Closed
lxjingGZ opened this issue Jun 1, 2024 · 6 comments
Closed

From returns blank when it contains special characters #1043

lxjingGZ opened this issue Jun 1, 2024 · 6 comments

Comments

@lxjingGZ
Copy link

lxjingGZ commented Jun 1, 2024

I am using the latest version, and recently encountered an issue. When the From field contains special characters, it causes the From value to return as blank. Here is an example that demonstrates the issue:


Received: from xxx by fast.ezcone.com with local (Exim 4.69)
    (envelope-from <[email protected]>)
    id 1PIMUx-0002xT-2t
    for [email protected]; Tue, 16 Nov 2012 08:27:03 -0600
From: XXXX Hunter <webmaster\@[email protected]>

@lxjingGZ lxjingGZ changed the title From field returns blank when it contains special characters From returns blank when it contains special characters Jun 1, 2024
@jstedfast
Copy link
Owner

jstedfast commented Jun 1, 2024

Which special characters. specifically? Are you talking about the \ before the @? Or are you talking about whatever characters the XXXX replaced?

@mirror222
Copy link

mirror222 commented Jun 1, 2024

Which special characters. specifically? Are you talking about the \ before the @? Or are you talking about whatever characters the XXXX replaced?

Yes, just as you said, I also encountered this problem. Thank you.

@jstedfast
Copy link
Owner

jstedfast commented Jun 1, 2024

Okay, the problem is that \ is not a valid atom character. It can only appear in quotes. Authors of these email programs really need to start reading and following the specifications rather than just making up syntax out of thin air ☹️

Syntax from RFC5322:

   addr-spec       =   local-part "@" domain

   local-part      =   dot-atom / quoted-string / obs-local-part

   atext           =   ALPHA / DIGIT /    ; Printable US-ASCII
                       "!" / "#" /        ;  characters not including
                       "$" / "%" /        ;  specials.  Used for atoms.
                       "&" / "'" /
                       "*" / "+" /
                       "-" / "/" /
                       "=" / "?" /
                       "^" / "_" /
                       "`" / "{" /
                       "|" / "}" /
                       "~"

   atom            =   [CFWS] 1*atext [CFWS]

   dot-atom-text   =   1*atext *("." 1*atext)

   dot-atom        =   [CFWS] dot-atom-text [CFWS]

   specials        =   "(" / ")" /        ; Special characters that do
                       "<" / ">" /        ;  not appear in atext
                       "[" / "]" /
                       ":" / ";" /
                       "@" / "\" /
                       "," / "." /
                       DQUOTE

As you can see in the syntax definitions above, a local-part token that matches the dot-atom syntax is explicitly disallowed to contain the \ character.

That said, MailKit already supports @ in the local-part as long as it's not the first character.

@jstedfast
Copy link
Owner

I'm not sure how these addresses are supposed to be encoded. I'm pretty sure I've typically seen them in the form webmaster%[email protected].

Need to do more research on this...

@mirror222
Copy link

Okay, the problem is that \ is not a valid atom character. It can only appear in quotes. Authors of these email programs really need to start reading and following the specifications rather than just making up syntax out of thin air ☹️

Syntax from RFC5322:

   addr-spec       =   local-part "@" domain

   local-part      =   dot-atom / quoted-string / obs-local-part

   atext           =   ALPHA / DIGIT /    ; Printable US-ASCII
                       "!" / "#" /        ;  characters not including
                       "$" / "%" /        ;  specials.  Used for atoms.
                       "&" / "'" /
                       "*" / "+" /
                       "-" / "/" /
                       "=" / "?" /
                       "^" / "_" /
                       "`" / "{" /
                       "|" / "}" /
                       "~"

   atom            =   [CFWS] 1*atext [CFWS]

   dot-atom-text   =   1*atext *("." 1*atext)

   dot-atom        =   [CFWS] dot-atom-text [CFWS]

   specials        =   "(" / ")" /        ; Special characters that do
                       "<" / ">" /        ;  not appear in atext
                       "[" / "]" /
                       ":" / ";" /
                       "@" / "\" /
                       "," / "." /
                       DQUOTE

As you can see in the syntax definitions above, a local-part token that matches the dot-atom syntax is explicitly disallowed to contain the \ character.

That said, MailKit already supports @ in the local-part as long as it's not the first character.

you are so right, they can't make up syntax out of thin air ☹️ lol

@lxjingGZ lxjingGZ closed this as completed Jun 2, 2024
jstedfast added a commit that referenced this issue Jun 2, 2024
…t.com"

If we encounter a \@ sequence, convert that to %40 when the
FormatOptions.AddressParserComplianceMode value is Looser.

This solution isn't ideal, but is probably the simplest option that we
can do for invalid local-parts like this.

The other option would be to quote the local-part, but that would be a
much more involved fix because obs-local-part allows a mix of qstring
and atom tokens separated by '.' in a local-part (the modern form only
allows a single qstring -or- multiple atoms separated by '.'s).

Because of obs-local-part, we can't just wrap the token with DQUOTEs
when we finish consuming the local-part, because it *could* include
1 or more qstrings that we would have to escape. We also can't just
quote individual atom tokens containing the \@ sequence, because then
the esxample address above would end up being:

    "webmaster@custom-host".com

Even though that would be syntactically valid, it's not likely to be
interpreted the same. Ideally, if we were to implement a solution that
quoted the relevant parts of the local-part token, it would look like
this:

    "[email protected]"

This is *doable*, but not without significant rewriting of the current
TryParseLocalPart method logic.

That said, even *that* might not get interpreted the as the same mailbox
by whatever mail software generated the \@ sequence in the first place.

(Obviously, the same goes for this %40 hack.)

Fixes issue #1043
@jstedfast
Copy link
Owner

jstedfast commented Jun 2, 2024

Okay, so I've added support for addresses like webmaster\@[email protected]

If the address parser encounter a \@ sequence, it will convert that to %40 when the
FormatOptions.AddressParserComplianceMode value is Looser.

This solution isn't ideal, but is probably the simplest option that we can do for invalid local-parts like this.

The other option would be to quote the local-part, but that would be a much more involved fix because obs-local-part allows a mix of qstring and atom tokens separated by . in a local-part (the modern form only allows a single qstring -or- multiple atoms separated by .s which would be much simpler).

Because of obs-local-part, we can't just wrap the token with DQUOTEs when we finish consuming the local-part, because it could include 1 or more qstrings that we would have to escape. We also can't just quote individual atom tokens containing the \@ sequence, because then the local-part from the example address above would end up being:

"webmaster@custom-host".com

Even though that would be syntactically valid, it's not likely to be interpreted the same. Ideally, if we were to implement a solution that quoted the relevant parts of the local-part token, it would look like this:

This is doable, but not without significant rewriting of the current TryParseLocalPart method logic.

That said, even that might not get interpreted the as the same mailbox by whatever mail software generated the \@ sequence in the first place.

(Obviously, the same goes for this %40 hack.)

There may not even be a universally correct interpretation of this style of address. In other words, some mail software might accept the %40 encoding and deliver the message to the correct mailbox while others will only accept a quoted local-part or only accept \@ whereas others might accept some combination but not all.

I wish I had more information about what software generated that address and which servers would accept what.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants