-
-
Notifications
You must be signed in to change notification settings - Fork 828
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preview text / snippet #1001
Comments
The IMAP protocol does not contain a command that will give you a preview snippet of the message. What MailKit does is to implement the suggestion I gave to some developers who kept asking me "how can I get some preview text to display to the user in a message list like Outlook or GMail does?" What I told them to do was to first After getting tired of answering this question, I decided to implement this as a feature of MailKit and that is what you are seeing. So yes, unfortunately, that means that sometimes you see raw HTML tags. If you've got a better idea for obtaining this, I would love to hear it because I agree that this solution isn't ideal. I just don't know of a better way. |
Out of curiosity, I just opened up GMail and looked through a few messages to see how much text MailKit would have to download in order to get the preview text that GMail displays for them. In one sample, MailKit would need to download 12K (message body is HTML) in order to extract the text that I could see in GMail's message list view. That's just impractical. Looking at my iPhone's Mail.app, it seems to do the same. Holy cow. Well, I guess the solution for you to do, then, is to download the entire message and then screen-grab the body text and display that as the preview text. |
Thank you for your quick and valuable response. |
FWIW, I'm working on fixing MimeKit's HtmlTokenizer to make this possible w/ truncated data (which is what you get if you don't grab the full stream). I might try to make a public class for this in MimeKit or MailKit, just not sure how to expose the API yet. |
You are cool man. |
Step 1 has been to add the hooks I need into the HtmlTokenizer which I have now: jstedfast/HtmlKit@ec07b51 Then I noticed that the HTML entity decoder was terribly slow, so I optimized that: jstedfast/HtmlKit@10d010c And based on my research so far... it seems that GMail generates its preview text based on the first 16K of the message body text. Then I have a few classes that I'm working on that look like this: using System;
using System.IO;
using System.Text;
using MimeKit.Utils;
namespace MimeKit.Text {
/// <summary>
/// An abstract class for generating a text preview of a message.
/// </summary>
/// <remarks>
/// An abstract class for generating a text preview of a message.
/// </remarks>
public abstract class TextPreviewer
{
int maximumPreviewLength;
/// <summary>
/// Initializes a new instance of the <see cref="TextPreviewer"/> class.
/// </summary>
/// <remarks>
/// Initializes a new instance of the <see cref="TextPreviewer"/> class.
/// </remarks>
protected TextPreviewer ()
{
maximumPreviewLength = 100;
}
/// <summary>
/// Get the input format.
/// </summary>
/// <remarks>
/// Gets the input format.
/// </remarks>
/// <value>The input format.</value>
public abstract TextFormat InputFormat {
get;
}
/// <summary>
/// Get or set the maximum text preview length.
/// </summary>
/// <remarks>
/// Gets or sets the maximum text preview length.
/// </remarks>
/// <value>The maximum text preview length.</value>
/// <exception cref="System.ArgumentOutOfRangeException">
/// <paramref name="value">is less than <c>1</c> or greater than <c>1024</c>.</paramref>
/// </exception>
public int MaximumPreviewLength {
get { return maximumPreviewLength; }
set {
if (value < 1 || value > 1024)
throw new ArgumentOutOfRangeException (nameof (value));
maximumPreviewLength = value;
}
}
/// <summary>
/// Get a text preview of the text part.
/// </summary>
/// <remarks>
/// Gets a text preview of the text part.
/// </remarks>
/// <param name="body">The text part.</param>
/// <returns>A string representing a shortened preview of the original text.</returns>
/// <exception cref="System.ArgumentNullException">
/// <paramref name="body"/> is <c>null</c>.
/// </exception>
public static string GetPreviewText (TextPart body)
{
if (body == null)
throw new ArgumentNullException (nameof (body));
if (body.Content == null)
return string.Empty;
var encoding = body.ContentType.CharsetEncoding;
if (encoding == null) {
using (var content = body.Content.Open ()) {
if (!CharsetUtils.TryGetBomEncoding (content, out encoding))
encoding = CharsetUtils.UTF8;
}
}
using (var content = body.Content.Open ()) {
TextPreviewer previewer;
if (body.IsHtml)
previewer = new HtmlTextPreviewer ();
else
previewer = new PlainTextPreviewer ();
try {
return previewer.GetPreviewText (content, encoding);
} catch (DecoderFallbackException) {
return previewer.GetPreviewText (content, CharsetUtils.Latin1);
}
}
}
/// <summary>
/// Get a text preview of a string of text.
/// </summary>
/// <remarks>
/// Gets a text preview of a string of text.
/// </remarks>
/// <param name="text">The original text.</param>
/// <returns>A string representing a shortened preview of the original text.</returns>
/// <exception cref="System.ArgumentNullException">
/// <paramref name="text"/> is <c>null</c>.
/// </exception>
public virtual string GetPreviewText (string text)
{
if (text == null)
throw new ArgumentNullException (nameof (text));
using (var reader = new StringReader (text))
return GetPreviewText (reader);
}
/// <summary>
/// Get a text preview of a stream of text in the specified charset.
/// </summary>
/// <remarks>
/// Get a text preview of a stream of text in the specified charset.
/// </remarks>
/// <param name="stream">The original text stream.</param>
/// <param name="charset">The charset encoding of the stream.</param>
/// <returns>A string representing a shortened preview of the original text.</returns>
/// <exception cref="System.ArgumentNullException">
/// <para><paramref name="stream"/> is <c>null</c>.</para>
/// <para>-or-</para>
/// <para><paramref name="charset"/> is <c>null</c>.</para>
/// </exception>
public virtual string GetPreviewText (Stream stream, string charset)
{
if (stream == null)
throw new ArgumentNullException (nameof (stream));
if (charset == null)
throw new ArgumentNullException (nameof (charset));
Encoding encoding;
try {
encoding = CharsetUtils.GetEncoding (charset);
} catch (NotSupportedException) {
encoding = CharsetUtils.UTF8;
}
return GetPreviewText (stream, encoding);
}
/// <summary>
/// Get a text preview of a stream of text in the specified encoding.
/// </summary>
/// <remarks>
/// Get a text preview of a stream of text in the specified encoding.
/// </remarks>
/// <param name="stream">The original text stream.</param>
/// <param name="charset">The encoding of the stream.</param>
/// <returns>A string representing a shortened preview of the original text.</returns>
/// <exception cref="System.ArgumentNullException">
/// <para><paramref name="stream"/> is <c>null</c>.</para>
/// <para>-or-</para>
/// <para><paramref name="encoding"/> is <c>null</c>.</para>
/// </exception>
public virtual string GetPreviewText (Stream stream, Encoding encoding)
{
if (stream == null)
throw new ArgumentNullException (nameof (stream));
if (encoding == null)
throw new ArgumentNullException (nameof (encoding));
using (var reader = new StreamReader (stream, encoding, false, 4096, true))
return GetPreviewText (reader);
}
/// <summary>
/// Get a text preview of a stream of text.
/// </summary>
/// <remarks>
/// Gets a text preview of a stream of text.
/// </remarks>
/// <param name="reader">The original text stream.</param>
/// <returns>A string representing a shortened preview of the original text.</returns>
/// <exception cref="System.ArgumentNullException">
/// <paramref name="reader"/> is <c>null</c>.
/// </exception>
public abstract string GetPreviewText (TextReader reader);
}
} using System;
using System.IO;
using System.Linq;
using System.Collections.Generic;
namespace MimeKit.Text {
/// <summary>
/// A text previewer for HTML content.
/// </summary>
/// <remarks>
/// A text previewer for HTML content.
/// </remarks>
public class HtmlTextPreviewer : TextPreviewer
{
/// <summary>
/// Initializes a new instance of the <see cref="HtmlTextPreviewer"/> class.
/// </summary>
/// <remarks>
/// Creates a new previewer for HTML.
/// </remarks>
public HtmlTextPreviewer ()
{
}
/// <summary>
/// Get the input format.
/// </summary>
/// <remarks>
/// Gets the input format.
/// </remarks>
/// <value>The input format.</value>
public override TextFormat InputFormat {
get { return TextFormat.Html; }
}
static bool IsWhiteSpace (char c)
{
return char.IsWhiteSpace (c) || (c >= 0x200B && c <= 0x200D);
}
static bool Append (char[] preview, ref int previewLength, string value, ref bool lwsp)
{
int i;
for (i = 0; i < value.Length && previewLength < preview.Length; i++) {
if (IsWhiteSpace (value[i])) {
if (!lwsp) {
preview[previewLength++] = ' ';
lwsp = true;
}
} else {
preview[previewLength++] = value[i];
lwsp = false;
}
}
if (i < value.Length) {
if (lwsp)
previewLength--;
preview[previewLength - 1] = '\u2026';
lwsp = false;
return true;
}
return false;
}
sealed class HtmlTagContext
{
public HtmlTagContext (HtmlTagId id)
{
TagId = id;
}
public HtmlTagId TagId {
get;
}
public int ListIndex {
get; set;
}
public bool SuppressInnerContent {
get; set;
}
}
static bool SuppressContent (IList<HtmlTagContext> stack)
{
int lastIndex = stack.Count - 1;
return lastIndex >= 0 && stack[lastIndex].SuppressInnerContent;
}
HtmlTagContext GetListItemContext (IList<HtmlTagContext> stack)
{
for (int i = stack.Count; i > 0; i--) {
var ctx = stack[i - 1];
if (ctx.TagId == HtmlTagId.OL || ctx.TagId == HtmlTagId.UL)
return ctx;
}
return null;
}
static void Pop (IList<HtmlTagContext> stack, HtmlTagId id)
{
for (int i = stack.Count; i > 0; i--) {
if (stack[i - 1].TagId == id) {
stack.RemoveAt (i - 1);
break;
}
}
}
static bool ShouldSuppressInnerContent (HtmlTagId id)
{
switch (id) {
case HtmlTagId.OL:
case HtmlTagId.Script:
case HtmlTagId.Style:
case HtmlTagId.Table:
case HtmlTagId.TBody:
case HtmlTagId.THead:
case HtmlTagId.TR:
case HtmlTagId.UL:
return true;
default:
return false;
}
}
/// <summary>
/// Get a text preview of a stream of text.
/// </summary>
/// <remarks>
/// Gets a text preview of a stream of text.
/// </remarks>
/// <param name="reader">The original text stream.</param>
/// <returns>A string representing a shortened preview of the original text.</returns>
/// <exception cref="System.ArgumentNullException">
/// <paramref name="reader"/> is <c>null</c>.
/// </exception>
public override string GetPreviewText (TextReader reader)
{
if (reader == null)
throw new ArgumentNullException (nameof (reader));
var tokenizer = new HtmlTokenizer (reader) { IgnoreTruncatedTags = true };
var preview = new char[MaximumPreviewLength];
var stack = new List<HtmlTagContext> ();
var prefix = string.Empty;
int previewLength = 0;
HtmlTagContext ctx;
HtmlAttribute attr;
bool body = false;
bool full = false;
bool lwsp = true;
HtmlToken token;
while (!full && tokenizer.ReadNextToken (out token)) {
switch (token.Kind) {
case HtmlTokenKind.Tag:
var tag = (HtmlTagToken) token;
if (!tag.IsEndTag) {
if (body) {
switch (tag.Id) {
case HtmlTagId.Image:
if ((attr = tag.Attributes.FirstOrDefault (x => x.Id == HtmlAttributeId.Alt)) != null) {
full = Append (preview, ref previewLength, prefix + attr.Value, ref lwsp);
prefix = string.Empty;
}
break;
case HtmlTagId.LI:
if ((ctx = GetListItemContext (stack)) != null) {
if (ctx.TagId == HtmlTagId.OL)
full = Append (preview, ref previewLength, $" {++ctx.ListIndex}. ", ref lwsp);
else
full = Append (preview, ref previewLength, " \u2022 ", ref lwsp);
prefix = string.Empty;
}
break;
case HtmlTagId.Br:
case HtmlTagId.P:
prefix = " ";
break;
}
if (!tag.IsEmptyElement) {
ctx = new HtmlTagContext (tag.Id) {
SuppressInnerContent = ShouldSuppressInnerContent (tag.Id)
};
stack.Add (ctx);
}
} else if (tag.Id == HtmlTagId.Body && !tag.IsEmptyElement) {
body = true;
}
} else if (tag.Id == HtmlTagId.Body) {
stack.Clear ();
body = false;
} else {
Pop (stack, tag.Id);
}
break;
case HtmlTokenKind.Data:
if (body && !SuppressContent (stack)) {
var data = (HtmlDataToken) token;
full = Append (preview, ref previewLength, prefix + data.Data, ref lwsp);
prefix = string.Empty;
}
break;
}
}
if (lwsp && previewLength > 0)
previewLength--;
return new string (preview, 0, previewLength);
}
}
} |
Another update: It seems that GMail will generate up to about 110 characters worth of "preview snippet" text, but I'm not sure if that's just because that's about the maximum that will fit on my screen and if a wider monitor would get me more text or not. If I provide this as a class in MimeKit, though, I can't take into consideration the widths of glyphs in the font because who knows what system developers will be using MimeKit/MailKit on to render their messages, so basing it on rendering bounds is just not practical. ... But 110 characters seems reasonable, so I think I'll do that. |
I have checked length of snippet via Gmail public Api and the max length of the snippet is 230 characters. Thank you. |
Part of the fix for jstedfast/MailKit#1001
Ah, that was a good idea to check the Gmail API... I've made the previewer configurable (up to 1024 characters long, at least), but maybe I'll bump it up to 230 (currently defaults to 110). The above commit adds the necessary classes to MimeKit to generate the snippets from a Next step is to update ImapFolder.cs to use those classes. |
I remembered tonight that there was, at one point, talk of an IMAP extension for this so I looked it up and found it. It was an extension called SNIPPET for a while, but it looks like the latest revision of the draft spec is now calling them PREVIEWs: https://tools.ietf.org/html/draft-ietf-extra-imap-fetch-preview-07 |
They suggest limiting PREVIEW text to 200 characters but a max of 256. Based on that, I'll probably just go with the 230 recommendation that you suggested based on GMail's API. |
OK, bumped the default to 230 as well, now. |
That's cool. But how can I use the new Preview Text.
} but PreviewText from MessageSummary is still have Html tags. Thank you. |
I haven't published any packages with the PreviewText fixes yet, so you'd have to build from source. Even the MyGet packages won't have this feature yet because the build requires a newer version of MimeKit to be released with the PreviewText feature. |
So the PreviewText feature will present in the new MimeKit Version may be 2.5.3 version? |
It'll probably be 2.6.0 |
I'll also be releasing MailKit 2.6.0 at about the same time and when I do, the code you are currently using should work fine. |
Ok . Thank you a lot. |
Issue related with Preview text.
I need to receive messages from Gmail and all is ok except Preview text. The Preview text sometimes conteins Html tags ,sometimes links and sometimes abnormal characters which is not correct. I compared previewText result with snippet of Google api and the snippet of Google Api is exactly the same which I can see in my mail list.
Can you explain please is it normal behavior that PreviewText conteins Html tegs, links and some abnormal characters?
How can I receive the snippet?
actual recult
"snippet": " <html xmlns="http://www.w3.org/1999/xhtml\"> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/> <meta http-equ",
"snippet": "Hi Smbat,\r\n\r\nArman Karapetyan invited you to like Aleeque.\r\n\r\nIf you like Aleeque follow the link below:\r\nhttps://www.facebook.com/n/?pag2F&id=1045999611810&ori=page_invite&ext=1587243ash=AeT4-qrU_ustQegh&ref=1584649",
expected result
the string that is same that in Gmail list
Thank you
The text was updated successfully, but these errors were encountered: