-
-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Newline missing between headers #695
Comments
ContentType contentType = ContentType.Parse("text/plain");
Header.TryParse($"{contentType}\r\n", out Header contentTypeHeader);
HeaderList headers = new HeaderList();
headers.Add(contentTypeHeader);
headers["Content-Encoding"] = "binary";
using (MemoryStream ms = new MemoryStream())
{
headers.WriteTo(ms);
Console.Write(Encoding.ASCII.GetString(ms.GetBuffer()));
} When I give a string that ends with a newline to the I use this as a workaround, because I think actually the code during |
What you want to use is: contentType.ToString (Encoding.UTF8, true); That will encode the new-line characters. The normal ToString() method is meant only for display purposes. |
This would also work: HeaderList headers = new HeaderList();
headers["Content-Type"] = "text/plain";
headers["Content-Encoding"] = "binary"; Anyway, closing this since it works the way it does because it has to serialize headers back exactly how it parsed them. |
Thank you, defining the encoding is of course the better usage - I didnt figure it out yet. However, even when using the encoding, the newline is still missing, if I didn't parse it with having a newline already. If this is the expected behavior, it's not a bug, but a misunderstanding. |
Ugh, you are correct - sorry about that. MimeKit internally uses an internal
I gave you bad advice :( Is there are a reason you are parsing it first and then reserializing it? You could just set the raw value on the HeaderList like this: HeaderList headers = new HeaderList();
headers["Content-Type"] = "text/plain";
headers["Content-Encoding"] = "binary"; That logic will unfold, parse, and reserialize things so that the header value is appropriately folded to meet MIME recommendations. |
FWIW, I believe that this bug report is a very fair issue to ask about. It is 100% fair to argue that one or more of the following changes should be made:
Given that, I'm going to re-open this until I've had time to think about this more. The ContentType and ContentDisposition ToString() changes scare me a bit because of how it might break existing behavior and therefor could potentially break uses of it in real-world apps. For example, someone might currently be doing: using (var writer = new StreamWriter (stream)) {
// Write the MIME headers...
writer.WriteLine (contentType.ToString (Encoding.UTF8, true));
writer.WriteLine (contentDisposition.ToString (Encoding.UTF8, true));
// Write the header terminator
writer.WriteLine ();
// Write the content
...
} If I change the ToString() methods to add a terminating newline, then the above code would break. |
…nd with a newline Fixes issue #695
Ok, so I looked into option # 2 and it was a fairly simple change so I implemented it. This fix will be in a MimeKit v2.15.0 release which I plan on making this week. |
I also added (a few days ago) new ContentType/ContentDisposition |
Don't worry, these are my first steps with MimeKit, I'm still in gaming-mode ;) Since your advertising sounds promising, I gave MimeKit (Lite) a try for parsing and building RFC conform MIME headers and bodies. I process headers and bodies separately, because MimeKit wants to manage everything in memory, what could lead to huge memory consumption (not so good on a mobile device f.e.), if the MIME entities are getting bigger. I manage large entities and bodies using a self-made stream, that automatically switches to manage the data in a temporary file, as soon as it exceeds an in-memory limit. The disadvantage is: With that I can't use parts of the high level API of MimeKit in some cases (such as MIME entity parsing, MIME part handling, etc.). But with a little try-and-error until I know MimeKit a bit better, I'm confident that I should be able to manage that using some low level API methods. The code here is just a small piece of POC code to reproduce the issue. My thoughts when using the What about adding a ...
public Header CreateHeader (FormatOptions options, Encoding charset, bool encode)
{
if (options == null)
throw new ArgumentNullException (nameof (options));
if (charset == null)
throw new ArgumentNullException (nameof (charset));
var value = new StringBuilder (MediaType);
value.Append ('/');
value.Append (MediaSubtype);
if (encode) {
int lineLength = value.Length;
Parameters.Encode (options, value, ref lineLength, charset);
} else {
value.Append (Parameters.ToString ());
}
return new Header (HeaderId.ContentType, value.ToString ());
}
... When I add the resulting I think the most effective and non-breaking solution for version 2.x would be option 3, because the |
Have you looked into the MimeParser's http://www.mimekit.net/docs/html/M_MimeKit_MimeParser__ctor_3.htm If you need an overridable method on the MimeParser to make it possible for the MimeParser to use a custom stream implementation to store the contents of MimeParts, I can probably add a hook for that. |
Currently I have three limitations, when handling an incoming request MIME entity:
I parse the MIME entity (multipart, but not nested) from a (not seekable)
Depending on the body encoding I process the MimeKit is focusing on managing an entire MIME entity, and it does it pretty good so far. I need to handle the MIME entity like a hot potato in my hands, keep the memory usage as low as possible while the performance stays as good as possible - even if the performance will go down, if a body content exceeds the in-memory handling limitation and needs to be stored in a temporary file. However, since the data source is a Finally, for sending the outgoing response MIME entity to the The persistent stream solution would be perfect, if the source stream wouldn't be a In order to be able to use the
and returns the stream for storing the (decoded?) content, I'd be able to return a Null-stream to discard the content, or a temporary memory/file stream to store the contents: // Somewhere in the parser code
Stream bodyStream = factoryDelegate?.Invoke (mainEntity, currentPart) ?? new MemoryStream (); Just as an example - maybe it'd be better to place a factory option in the parser options, instead of using a constructor parameter (or use a hook, as you suggested): MimeParser mimeParser = new MimeParser (sourceStream, persistent: false, (mainEntity, currentPart) =>
{
// Returns the conditional stream to store the decoded body content
if (currentPart.Headers.Contains("..."))
{
return new AnyStream (currentPart.Headers);
}
else
{
return Stream.Null;
}
}); In this case the parser shouldn't throw an exception for unsupported encodings, since the used encoding may be supported from the custom stream (if not, the factory delegate should throw instead). This could also improve email apps, when they're able to decide to use memory streams for inline parts and file streams for attachments f.e. |
Okay, so based on what I think you're saying, what you really want is a "streaming" MIME parser (or at least a "streaming" multipart parser). It seems that there is a special need for this kind of parser in the HTTP world where you can apparently POST more-or-less endless multiparts and/or at least multiparts containing very large file streams. This apparently goes for both client-side where an application might use an HTTP GET request which results in a large multipart as well as server-side implementations of a POST request that contains large multiparts. MimeKit's MimeParser API is suboptimal for this kind of thing even if I added a way to override which type of Stream the MimeParser creates for storing the content of each MimePart. That said, even though MimeKit is really targeted for email applications, perhaps I could implement such a streaming parser API for this kind of thing... In the meantime, MimeKit does provide some pretty handy tooling for what you want to do. You should definitely take a look at the Next up, there are some useful techniques to be found in my MailKit project. In particular, the streaming technique I use in both the POP3 and IMAP implementations that wrap a NetworkStream (or SslStream depending on the connection type) that know when to stop reading data based on the current protocol state. For example, in the Pop3Stream.ReadAsync method, it scans for a line containing ".\r\n" and once it reaches that, declares EOF by returning 0 and will prevent you from reading beyond that point. The ReadAsync method also unmunges (or un-byte-stuffs) the stream, meaning that it takes lines that start with ".." and removes the first dot (this is a POP3ism that is meant to safeguard the termination sequence). The ImapStream.ReadAsync method is actually a lot simpler since it doesn't have to unmunge any data, it just knows that if it is in the "IMAP Literal String" state, that it should read until the literal is consumed and then return 0 (signifying EOF). In IMAP, there is a concept called a "literal string" which is how IMAP sends large (or non-ASCII) content. The way it does it is like this: What I would probably do is implement a "MultipartContentStream" class that did something similar and knew only to read until the next MIME boundary marker and then pretend that it had reached EOF. // Get the ContentType of the POST request or GET request, I guess?
var contentType = postRequest.ContentType;
// Construct a new StreamingMultipartParser that uses the ContentType to know about the boundary marker.
var parser = new StreamingMultipartParser (contentType, stream);
// consume the multipart prologue (the garbage that leads up to the very first boundary marker)
parser.ConsumeMultipartPrologue ();
// the ReadNext method would construct a MimePart with the parsed headers, same as
// MimeParser.ParseHeaders() does or MimeParser.ParseMessage/Entity() does now
// except that it wouldn't load the content into a MemoryBlockStream. Instead, it would
// set the MimePart.Content.Stream to a MultipartContentStream that knows to stop reading
// once it reaches the multipart boundary marker.
while (!parser.IsEndOfStream) {
var part = parser.ReadNext ();
var fileName = GetFileName (part);
// Note: up until this point, the the content of the MimePart has never been read from the NetworkStream...
// so part.Content.Stream has vey little overhead. It doesn't even need an input buffer because it should
// just make use of the parser's internal input buffer.
using (var fileStream = File.Create (fileName)) {
// The DecodeTo method will actually be reading the raw MIME content from the NetworkStream and
// decoding it right into the file stream, in blocks of 4K at a time - so memory usage stays really low.
part.Content.DecodeTo (fileStream);
}
} This could be so blazingly fast... I love it. Question: Do these types of multiparts support nested multiparts? (because that would make this more complicated). I would ask if message/rfc822 is supported, but I think that could easily just be mapped (in this parser) to a MimePart as well instead of a MessagePart like MimeParser uses. The problem with trying to handle a message/rfc822 part by returning a MessagePart is that the MessagePart recursively parses the embedded message instead of returning just uninterpreted content. |
@nitinag I was wondering if you might be interested in a more SAX-like MIME parser API. I'm not sure how familiar you guys are with XML parsers (I'm not very familiar myself), but there are 2 main types of XM:L parsers:
What @nd1012 essentially wants is more like a SAX parser, but for MIME. The OnMimeEntityBegin/End-type stuff I added to MimeParser a while back for @nitinag gives the MimeParser some of the features of a SAX parser while still keeping it a DOM parser. I will likely need to do some brushing up on XML SAX parser APIs to see if I can figure out how they let the consumer know how the nodes are nested. I would likely need to avoid using current MimeEntity-based objects because they expect to hold onto the full list of headers/etc in order to build up the full MIME message/entity. I sort of see an API like this: class StreamingMimeParser
{
// these would be called for MimeMessage *and* MimeEntity headers
protected void OnHeaderParsed (Header header);
// MimeMessage stuff
protected void OnMimeMessageBegin (long offset);
protected void OnMimeMessageEnd (long offset);
// MimeEntity stuff
protected void OnMimeEntityBegin (long offset);
protected void OnMimeEntityEnd (long offset);
// MimePart Content
protected void OnMimeContentBegin (long offset);
protected void OnMimeContentRead (byte[] content, int startIndex, int count);
protected void OnMimeContentEnd (long offset);
// Multipart boundary stuff - consumer would likely need to use these to push/pop their own MIME entity stack?
protected void OnBoundaryPushed (string boundary);
protected void OnBoundaryPopped (string boundary);
// And then, of course, we have the public Parse/ParseAsync methods:
public void Parse ();
public Task ParseAsync ();
} Obviously, all of the OnXyz() methods could all have corresponding events that get emitted by the default implementation so you could choose to either subclass the parser or just add event hooks, whichever way works best for you. I spent the night tossing and turning thinking about this, trying to figure out a good way to do it for a full MimeParser-type solution rather than just the StreamingMultipartParser API that I laid out in my previous comment which was extremely simplistic. If I was going to design a solution just for HTTP multipart-type stuff, then I would probably go with the simpler solution... but I think if I was going to implement this as an official part of MimeKit, I'd want a full parser API. |
Thought about this a lot today and took another look at the events I already have. I also got to thinking that if I did this right, it might make it easier to re-structure the existing parser to be more linear and not recursive. Either way, I could likely make MimeParser subclass this new parser and just have a MimeEntity stack internally to construct the MimeMessage/MimeEntity tree. Anyway... here's the hooks I know I will need after putting even more thought into this: protected virtual void OnMboxMarkerRead (byte[] marker, long offset, int lineNumber);
protected virtual void OnHeaderRead (Header header, int beginLineNumber);
#region MimeMessage Events
protected virtual void OnMimeMessageBegin (long beginOffset, int beginLineNumber);
protected virtual void OnMimeMessageEnd (long beginOffset, int beginLineNumber, long headersEndOffset, long endOffset, int lines);
#endregion MimeMessage Events
#region MimePart Events
protected virtual void OnMimePartBegin (long beginOffset, int beginLineNumber);
protected virtual void OnMimePartContentBegin (long beginOffset, int beginLineNumber);
protected virtual void OnMimePartContentRead (byte[] content, int startIndex, int count);
protected virtual void OnMimePartContentEnd (long beginOffset, int beginLineNumber, long endOffset, int lines);
protected virtual void OnMimePartEnd (long beginOffset, int beginLineNumber, long headersEndOffset, long endOffset, int lines);
#endregion MimePart Events
#region MessagePart Events
protected virtual void OnMessagePartBegin (long beginOffset, int beginLineNumber);
protected virtual void OnMessagePartEnd (long beginOffset, int beginLineNumber, long headersEndOffset, long endOffset, int lines);
#endregion MessagePart Events
#region Multipart Events
protected virtual void OnMultipartBegin (long beginOffset, int beginLineNumber);
// called when a multipart boundary is encountered (e.g. "--my-boundary-marker\r\n")
protected virtual void OnMultipartBoundary (string boundary, long beginOffset, long endOffset, int lineNumber);
// called when an end multipart boundary is encountered (e.g. "--my-boundary-marker--\r\n")
protected virtual void OnMultipartEndBoundary (string boundary, long beginOffset, long endOffset, int lineNumber);
protected virtual void OnMultipartPreambleBegin (long beginOffset, int beginLineNumber);
protected virtual void OnMultipartPreambleRead (byte[] content, int startIndex, int count);
protected virtual void OnMultipartPreambleEnd (long beginOffset, int beginLineNumber, long endOffset, int lines);
protected virtual void OnMultipartEpilogueBegin (long beginOffset, int lineNumber);
protected virtual void OnMultipartEpilogueRead (byte[] content, int startIndex, int count);
protected virtual void OnMultipartEpilogueEnd (long beginOffset, int beginLineNumber, long endOffset, int lines);
protected virtual void OnMultipartEnd (long beginOffset, int beginLineNumber, long headersEndOffset, long endOffset, int lines); I may change the OnMimePartContentRead/OnMultipartPreambleRead/OnMultipartEpilogueRead method names? "Read" implies that implementers should read into those buffers, but the truth is that they should write them. Also not sure if I'll need Async versions of the "Read" (or "Write"?) methods for Async parser APIs. The rest won't need to change. Maybe should also get passed the CancellationToken? |
[written prior to your most recent post - looks like you may have addressed some of it] Had to refresh myself on the code that we have for this: The benefit of of something like that for us would be significantly lower memory usage and performance. Right now we only pass messages to our subclassed custom MimeParser after the message has been fully read (into a file when large enough) so that any memory usage (due to messages that can be large - e.g., 100mb) is short lived. Everything is functional as is right now as well, so not a huge priority for us. Since it sounds like you're looking to design an api that covers multiple cases including ours, here are some notes: We have our own nested tracking since we need to track all the offsets of the nested messages, parts, etc. and their headers. In our tracking, after our own internal tracking/stacks, we end up with a linear list and use the imap part specifier nested format to identify nesting, which does have some specific rules on how to order the nested parts and in special cases like nested and top part encapsulated messages. We would still need the HeadersEndOffset since we need to store the start/end of the headers for messages/entities (to query just what we need later from the file), so maybe an OnHeadersEnd event or ensuring that OnMimeContent* runs on all types even if 0 length? We'd also still need the raw body line count since imap requires us to store that as well at least for messages and text parts. The only part of the parsed entity we use are:
|
Read makes more sense to me since that's what the mimeparser has read and byte[] content holds what the parser has read, which is when the event is triggered. I'm hoping that if we don't use the *Read methods, no memory usage occurs in allocating the byte[] content, etc. for those unused events?
If we were to ever use the *Read methods, one may also write the content out somewhere so aysnc may be useful to allow that use case. Async would also be useful on all the events so one could run database or network calls to check any conditions based on the entity read. Thinking about it some more, we'd probably prefer to use ArrayPool and stream based processing when needed on the parts so we don't have to allocate sometimes 100mb memory chunks and might just use this API to get all the offsets and headers pre-processing like we do now. So, we would really avoid those *Read methods. If those *Read events are always called with full byte content, aren't we pretty much back to the entity type memory usage in some ways, maybe even more since MimePart objects point to a stream rather than byte[]? In that case, it doesn't seem that much different or useful than what is already there except that you only have one part in memory at a time, but that MimePart could still be huge?
Would be nice to have a way to stop parsing at any event/point. |
Yep. Gotcha covered. I'll be keeping those in the OnXyzEnd() hooks, same as they are currently in the MimeParser event API.
Yep, I added that to my plan as well - after remembering I had that there in the current MimeParser event API.
I was thinking that in this new "MimeReader"(?) (I think I like MimeReader better than StreamingMimeParser) API, I was not planning to instantiate MimeEntity objects at all, but it should be possible to use the headers to figure this out. That said... I think I'll need to track the COntentType anyway as I parse the headers so that I know which entity type I'm parsing, so maybe I can pass that into the OnMimePartBegin/MultipartBegin/MessagePartBegin APIs and maybe to the equivalent End APIs as well? Technically, I think this is only really needed for OnMimePartBegin/End to cover the Basic vs Text differentiation. Maybe I can even add OnTextPartBegin/End instead? Not sure... will have to think on this one. It's probably possible to just figure it out based on previously-parsed headers (or lack-thereof, implying text/plain).
The MimeMessage convenience APIs like To/From/Cc/Subject/Date/etc are all doable w/o needing the overhead of MimeMessage... or, if you really want, you can just instantiate a new MimeMessage (or MimePart or whatever) and add the headers you collect from OnHeaderParsed(). None of that is currently handled by the MimeParser anyway, that's all handled by MimeMessage and MimeEntity. Anyway, I appreciate your feedback! I'll definitely want more feedback if/when I decide to go ahead with this. Now I just gotta convince the girlfriend to let me spend the necessary time on this ;-) I've already been spending a lot of late nights hacking on an NTLM rewrite for MailKit and trying to figure out how to add ChannelBinding support as well (starting with the SCRAM-SHA-X-PLUS mechanisms). Then I need to figure out ChannelBinding for NTLM. Lots of big new features in the works ;-) |
My plan was to pass along the MimeReader's internal
Yea, this was why I was considering having Async variants, but you raise a good point about how maybe they should all have Async variants? And have CancellationToken parameters as well. |
The MimeReader is an alternative to the MimeParser. Unlike the MimeParser, the MimeReader does not construct a MimeMessage or MimeEntity "DOM". Instead, it works more like a SAX parser in that it emits events as logical components of a MIME structure are parsed, allowing the developer to process MIME data as it is parsed rather than waiting for the entire message or entity to be completely constructed. This also allows developers a way to reduce their memory overhead. Implements the API discussed in issue #695
Ok, whipped up a new MimeReader class tonight based on this discussion and added it to a It's not documented yet, but I ported the MimeParser unit tests that recorded MIME offsets and it passes. Since the MimeReader can be used to get even more detailed offsets, I'll probably want to expand on the current unit tests to test every corner of it. A few notes:
|
@nd1012 I would also love to hear any input you have. I'm hoping something like this would work better for you than having to awkwardly work around MimeParser limitations for what you are trying to accomplish. |
One thought that had occurred to me a few days ago is whether a OnHeaderEnd method/event may be helpful to mark the header section end? As it stands right now, if you were interested in the headers, you could implement OnHeaderRead and build your Header list, but then to know that you've gotten/finished all the headers for that section, you would have to listen to all On*Begin methods to mark the end of the prior header section. I would imagine that OnHeaderEnd would trigger even in the case of 0 length header. headersEndOffset parameter could be moved to OnHeaderEnd, but there were edge cases where you update the headersEndOffset after reading the body, which are mentioned in the original issue and is the reason headersEndOffset wasn't just included as part of the On*Begin events originally I believe. However, wouldn't that same issue also affect the values being returned by OnHeaderRead then? I don't know whether that is off convention or not either, but the OnXyz naming pattern is pretty clear and useful from my (an implementers) perspective while subclassing. |
Yea, I think I did have to update the HeadersEndOffset - this was, I think, only for edge cases where the content of the body was empty because that new-line sequence would have to be re-counted as part of the boundary marker. I think it was also needed when the stream was truncated. That said, if I can figure out a better way of doing it, an OnHeadersEnd() "event" would make sense to have. |
@nd1012 It looks like Microsoft.AspNetCore.WebUtilities might be a good solution for you. Have you looked at it? https://github.com/aspnet/HttpAbstractions/tree/master/src/Microsoft.AspNetCore.WebUtilities The MultipartReaderStream is pretty much exactly what I was envisioning in #695 (comment). The MultipartReader class is also similar to what I was envisioning. May be worth looking into (not that I don't want you to use MimeKit, but WebUtilities may be lighter weight than using MimeKitLite and more tailored to your needs). |
The MimeReader is an alternative to the MimeParser. Unlike the MimeParser, the MimeReader does not construct a MimeMessage or MimeEntity "DOM". Instead, it works more like a SAX parser in that it emits events as logical components of a MIME structure are parsed, allowing the developer to process MIME data as it is parsed rather than waiting for the entire message or entity to be completely constructed. This also allows developers a way to reduce their memory overhead. Implements the API discussed in issue #695
Rewrote the header parser tonight to allow OnHeaderReadAsync()/OnHeadersEnd/Async() and it seems a lot cleaner now, but may be slower 😥 Working on some benchmarks so I can compare. I also want to build a new MimeParser API on top of the MimeReader and compare performance there as well. Would be great to only have 1 actual parser rather than 2 😂 Current benchmarks show MimeParser parsing the JWZ mbox in less than 260ms?? And MimeReader in 178ms? Seems crazy fast! |
Looks like I misremembered exactly what test I was remembering rough stats for. My excuse is I was commenting on my phone while watching a movie with my g/f 😂 In any event, here are the raw benchmarks as run this morning. The following 2 benchmarks were run using the old header parser in the MimeReader tests:
These next 2 benchmark runs were done after applying my patch that rewrote the header parser for MimeReader:
As we can see, the new MimeReader header parser is actually a smidgen slower but I wonder if I can speed things up a bit. The old header parser uses this type of approach: byte[] headerBuffer = new byte[512];
fixed (byte* inbuf = input) {
byte* inptr = inbuf + startIndex;
byte* inend = inbuf + inputLength;
byte* start = inptr;
*inend = '\n';
while (*inptr != '\n')
inptr++;
if (inptr < inend) {
// we know we have the complete line
int length = (int) (inptr - start);
if (length >= headerBuffer.Length)
Array.Resize (ref headerBuffer, NextAllocSize (length));
Buffer.BlockCopy (input, startIndex, headerBuffer, 0, length);
} else {
// ...
}
} whereas the new parser uses this approach: byte[] headerBuffer = new byte[512];
int bufferIndex = 0;
fixed (byte* inbuf = input) {
byte* inptr = inbuf + startIndex;
byte* inend = inbuf + inputLength;
byte* start = inptr;
*inend = '\n';
while (*inptr != '\n') {
if (bufferIndex >= headerBuffer.Length)
Array.Resize (headerBuffer, headerBuffer.Length + 64);
headerBuffer[bufferIndex++] = *inptr++;
}
} The downside to the old approach is that if The new approach, even though slower per loop iteration, may actually compensate for that by not having to scan over the same input data more than once if the line of data crosses buffer boundaries. In fact, the old code needs to loop over the same input data at a minimum of 2x. The first time finds the end of the line, and the second is the Buffer.BlockCopy() call. In the hopefully worst-case scenario, it would have to iterate over the input ~3x (maybe the first pass is only missing the '\n' because it hasn't been buffered yet). One thing I want to try is a hybrid approach to see if the overhead of the Buffer.BlockCopy() is actually less than the length comparison in every loop iteration. I'm pretty sure I've seen the implementation of BlockCopy() and it blits 8/4/2/1 bytes at a time, making it very fast, so it's possible that it is faster. I could also make the input buffer grow by 80 instead of 64 as that may be a more efficient grow size (MIME recommends wrapping lines to fit within 80 columns). Anyway... I've got a few tricks I can try to see if I can eke out a bit more performance. Happy that the new approach isn't drastically slower. |
Okay, so .NET 5 runtime results:
if I enable my FUNROLL_LOOPS code section in MimeReader:
Meh. Was hoping for a noticeable improvement, but it looks more-or-less like a nothing burger. Just switching to .NET 5 is more of an improvement than anything I've been able to eke out with tweaks to the new header parser 😭 Note: I made the mbox parser benchmarks loop over the mbox 10x which is why the numbers are bigger than they were in the original benchmark posting. |
I think I need to design a better benchmark that stresses mostly the header parser. |
Not bad... here are the results of parsing an mbox file consisting of a single looped message with lots of headers and a really short text/plain body:
For the MimeParser test, I set Here are the results for the same test if I enable my FUNROLL_LOOPS code segment in StepHeaderValue() instead of using the simple
Not a huge difference, but it's something... |
The MimeReader is an alternative to the MimeParser. Unlike the MimeParser, the MimeReader does not construct a MimeMessage or MimeEntity "DOM". Instead, it works more like a SAX parser in that it emits events as logical components of a MIME structure are parsed, allowing the developer to process MIME data as it is parsed rather than waiting for the entire message or entity to be completely constructed. This also allows developers a way to reduce their memory overhead. Implements the API discussed in issue #695
Okay, here's our comparison data: BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19043.1165 (21H1/May2021Update)
Notes:
|
The MimeReader is an alternative to the MimeParser. Unlike the MimeParser, the MimeReader does not construct a MimeMessage or MimeEntity "DOM". Instead, it works more like a SAX parser in that it emits events as logical components of a MIME structure are parsed, allowing the developer to process MIME data as it is parsed rather than waiting for the entire message or entity to be completely constructed. This also allows developers a way to reduce their memory overhead. Implements the API discussed in issue #695
The MimeReader is an alternative to the MimeParser. Unlike the MimeParser, the MimeReader does not construct a MimeMessage or MimeEntity "DOM". Instead, it works more like a SAX parser in that it emits events as logical components of a MIME structure are parsed, allowing the developer to process MIME data as it is parsed rather than waiting for the entire message or entity to be completely constructed. This also allows developers a way to reduce their memory overhead. Implements the API discussed in issue #695
The MimeReader is an alternative to the MimeParser. Unlike the MimeParser, the MimeReader does not construct a MimeMessage or MimeEntity "DOM". Instead, it works more like a SAX parser in that it emits events as logical components of a MIME structure are parsed, allowing the developer to process MIME data as it is parsed rather than waiting for the entire message or entity to be completely constructed. This also allows developers a way to reduce their memory overhead. Implements the API discussed in issue #695
The MimeReader is an alternative to the MimeParser. Unlike the MimeParser, the MimeReader does not construct a MimeMessage or MimeEntity "DOM". Instead, it works more like a SAX parser in that it emits events as logical components of a MIME structure are parsed, allowing the developer to process MIME data as it is parsed rather than waiting for the entire message or entity to be completely constructed. This also allows developers a way to reduce their memory overhead. Implements the API discussed in issue #695
The MimeReader is an alternative to the MimeParser. Unlike the MimeParser, the MimeReader does not construct a MimeMessage or MimeEntity "DOM". Instead, it works more like a SAX parser in that it emits events as logical components of a MIME structure are parsed, allowing the developer to process MIME data as it is parsed rather than waiting for the entire message or entity to be completely constructed. This also allows developers a way to reduce their memory overhead. Implements the API discussed in issue #695
The MimeReader is an alternative to the MimeParser. Unlike the MimeParser, the MimeReader does not construct a MimeMessage or MimeEntity "DOM". Instead, it works more like a SAX parser in that it emits events as logical components of a MIME structure are parsed, allowing the developer to process MIME data as it is parsed rather than waiting for the entire message or entity to be completely constructed. This also allows developers a way to reduce their memory overhead. Implements the API discussed in issue #695
The MimeReader is an alternative to the MimeParser. Unlike the MimeParser, the MimeReader does not construct a MimeMessage or MimeEntity "DOM". Instead, it works more like a SAX parser in that it emits events as logical components of a MIME structure are parsed, allowing the developer to process MIME data as it is parsed rather than waiting for the entire message or entity to be completely constructed. This also allows developers a way to reduce their memory overhead. Implements the API discussed in issue #695
The MimeReader is an alternative to the MimeParser. Unlike the MimeParser, the MimeReader does not construct a MimeMessage or MimeEntity "DOM". Instead, it works more like a SAX parser in that it emits events as logical components of a MIME structure are parsed, allowing the developer to process MIME data as it is parsed rather than waiting for the entire message or entity to be completely constructed. This also allows developers a way to reduce their memory overhead. Implements the API discussed in issue #695
@nitinag FWIW, I've created a MimeKit 3.0 board to track progress of what I want to accomplish for v3.0. |
The MimeReader is an alternative to the MimeParser. Unlike the MimeParser, the MimeReader does not construct a MimeMessage or MimeEntity "DOM". Instead, it works more like a SAX parser in that it emits events as logical components of a MIME structure are parsed, allowing the developer to process MIME data as it is parsed rather than waiting for the entire message or entity to be completely constructed. This also allows developers a way to reduce their memory overhead. Implements the API discussed in issue #695
@nitinag I just finished implementing a new MimeParser on top of MimeReader... Are you ready for the results? 😂
|
Gotta figure out the following 2 - they are way slower for some reason.
And this seems too fast?
|
Ok, there we go... ExperimentalMimeParser was always using persistent = false
Darn, slower than the original MimeParser, but at least respectable. It's probably about on-par with the 2.x MimeParser (I have some perf improvements in my mimereader branch for MimeParser). |
Okay, here we are... final benchmark now that all the kinks are worked out:
Looks like we more-or-less have a tie between the MImeParser and ExperimentalMimeParser. |
The JWZ mbox file is 10.2 MB and the becnhmark parses the mbox file 100 times. If I am mathing correctly, that's about 1 GB/s throughput from my parser. |
Oops, I did it again!
I remembered some optimizations that I had made to MimeParser and applied those same things to MimeReader and now ExperimentalMimeParser is always faster than MimeParser. Oh, and I should mention that MimeParser in the |
That's awesome work, looking forward to using MimeReader! |
The MimeReader is an alternative to the MimeParser. Unlike the MimeParser, the MimeReader does not construct a MimeMessage or MimeEntity "DOM". Instead, it works more like a SAX parser in that it emits events as logical components of a MIME structure are parsed, allowing the developer to process MIME data as it is parsed rather than waiting for the entire message or entity to be completely constructed. This also allows developers a way to reduce their memory overhead. Implements the API discussed in issue #695
The MimeReader is an alternative to the MimeParser. Unlike the MimeParser, the MimeReader does not construct a MimeMessage or MimeEntity "DOM". Instead, it works more like a SAX parser in that it emits events as logical components of a MIME structure are parsed, allowing the developer to process MIME data as it is parsed rather than waiting for the entire message or entity to be completely constructed. This also allows developers a way to reduce their memory overhead. Implements the API discussed in issue #695
Describe the bug
HeaderList.WriteTo
misses writing a\r\n
between two headers.Platform
To Reproduce
Code to reproduce the behavior:
Output:
Expected behavior
There should be a
\r\n
before theContent-Encoding
header starts, but it's written in one line.When I set both headers without the
HeaderList.Add
method, the line break will be written as expected.Am I missing something?
The text was updated successfully, but these errors were encountered: