High memory traffic from MemoryBlockStream #725

ekalchev · 2018-06-18T15:45:55Z

I am using MailKit for about an year and I notice memory spikes from time to time. In this particular case I am downloading full messages (header + message parts) from a local mail server (high bandwidth).

The process memory jumps from 300mb to 1gb for about 2-3 sec. I can see with the memory profiler the allocations are byte arrays of size 2048 bytes. They are allocated in MemoryBlockStream. Those allocations are not memory leaks as you can notice at 51s I forced GC and everything is gone. Still having so much memory traffic is not very good.

I can see that each MemoryBlockStream holds a private list of byte arrays that are thrown away after the stream is disposed.

What if a shared pool of byte arrays is used instead of allocating them for each stream?

https://github.com/dotnet/corefx/blob/master/src/System.Buffers/src/System/Buffers/ArrayPool.cs

or

replace entirely MemoryBlockStream with https://github.com/Microsoft/Microsoft.IO.RecyclableMemoryStream

jstedfast · 2018-06-18T17:55:05Z

Can you try these methods yourself and report back to me on how they perform? I don't have access to tooling to compare (one of the downsides of developing on a Mac).

Thanks!

jstedfast · 2018-06-18T18:40:33Z

I've got a patch to use ArrayPools that I'm about to land. Perhaps you can test it. If it doesn't solve the problem, could you try replacing uses of MemoryBlockStream inside of MimeParser.cs (and in ImapFolder.cs?) and see if that solves the problem?

ekalchev · 2018-06-19T07:09:12Z

The account I am testing with have a series of email with big attachment. I still see memory peaks but this time it is not caused by MemoryBlockStream.

You can see these peaks comes from MemoryStream.set_Capacity. It seems I have very big attachments and the memory stream keep resizing itself while writing

I think using RecycleableMemoryStream instead of MemoryStream (or MemoryBlockStream?) will solve the set_Capacity issue. However I am not sure how easy is to do that. I remember you have some checks in MailKit/MimeKit code like:

if (stream is MemoryStream) do not dispose.

And RecycleableMemoryStream inherit Memory Stream but it must be disposed to return the memory back to the pool.

As for ArrayPool - it is doing its jobs. ArrayPool support EventSource and I can see the buffers are pool as expected.

However when the series of mail messages with big attachment come I see this

What happens is that the array pool cannot rent anymore byte arrays and it start allocating new array without using pooling. ArrayPool.Create can create custom pools with user requested size. It would be very nice if you allow somehow the user to be able to supply it own array pool other than ArrayPool.Shared.

OR

Since MemoryBlockStream uses 2048 byte chunks a custom memory pool can be create like ArrayPool.Create(2048, 200); that will allocate 400k of memory which should be enough for the most scenarios. For bigger mime parts I guess allocating new arrays is unavoidable.

Fixes jstedfast/MailKit#725

jstedfast · 2018-06-19T11:33:08Z

Regarding MemoryStream.set_Capacity, that's being invoked when you write a message to a stream. The target stream is a MemoryStream. That's not related to parsing the message.

Can you find out where messages are being written to a MemoryStream and replace that logic with writing to a MemoryBlockStream? That will likely solve a huge chunk of your problem. If you are using message.ToString(), don't use that, use message.WriteTo (customStream);

As far as using ArrayPool.Shared, I agree, using a custom ArrayPool was something I was already considering doing but was waiting to hear back how this trial worked out. I was going to go with 100 chunks instead of 200, but meh, 200 is fine too. Any value we use is going to just be guessing anyway.

jstedfast · 2018-06-19T11:39:18Z

Question on ArrayPool: How many buckets does an ArrayPool allocate? Just the 1? Or will it create a number of buckets as it needs them but just not go above a hard-coded threshold?

ekalchev · 2018-06-19T11:58:17Z

I ran the DefaultArrayPool contructor because it is hard to read the bit operations they have

For these values

var maxArrayLength = 2048;
var maxArraysPerBucket = 200;

8 buckets are generated.

Bucket 0 = 16
Bucket 1 = 32
Bucket 2 = 64
Bucket 3 = 128
Bucket 4 = 256
Bucket 5 = 512
Bucket 6 = 1024
Bucket 7 = 2048

Buckets are only containers for which arrays will be allocated, so only Bucket 7 will be used by MemoryBlockStream

As for the maxArraysPerBucket - this will be a good idea to be configurable. For embedded systems - 2048 * 200 ~ 400kb may be is too much. The problem with that size is it will never be released by the array pool.

jstedfast · 2018-06-19T15:17:11Z

Ah, so that's what they mean by buckets. Ok. It sounds to me, then, like ArrayPool() is not the ideal solution to this problem because all other array sizes will go unused.

I think a better solution will be to use something like an ArrayPool, but with a single buffer size.

ekalchev · 2018-06-19T15:39:03Z

Keep in mind those arrays in the buckets are not pre-allocated. When array is requested with Rent, only then the array is created and added in the bucket. MemoryBlockStream will request only byte arrays of 2048 bytes and other buckets will stay empty. No memory will be wasted in other buckets.

Here the bucket code

https://github.com/dotnet/corefx/blob/bffef76f6af208e2042a2f27bc081ee908bb390b/src/System.Buffers/src/System/Buffers/DefaultArrayPoolBucket.cs

ekalchev · 2018-06-19T16:05:52Z

Regarding MemoryStream.set_Capacity

I am using ImapFolder.GetStreams method and then I construct MimeMessage with MimeMessage.Load for each stream. Then I am using custom MimeVisitor to get all the message parts as MimeEntity.

Then for every MimeEntity i am calling

mimeEntity.Headers.WriteTo
mimeEntity.WriteTo

to save header and body in my database. MemoryStream.set_Capacity is called from WriteTo methods. Is there more efficient way to do that?

jstedfast · 2018-06-19T16:36:27Z

Well, both of those WriteTo() methods take a Stream argument. It doesn't have to be a MemoryStream. You can use whatever stream type that you want. Seems like you should use something else...

Another option might be to use a MimeKit.IO.MeasuringStream to write it to that first to measure how much buffer space you need, then allocate a MemoryStream with that capacity, then call WriteTo() again on the MemoryStream?

It might be slower since you're writing the headers/mime parts twice, but the penalty for that might be less than the penalty to realloc the MemoryStream's buffer and copying all the data to the new internal buffer.

ekalchev · 2018-06-19T17:06:33Z

Hmm you are right when I saw the call stack I thought that MailKit is doing some internal allocations but the problem is the memory stream I am passing as argument.

MeasuringStream will be really useful for me. I am using SQLite and you can open a stream to SQLite blob column and directly write to the disk. The problem I had is that I didn't know what will be the output size which is required when creating SQLite blob column. This way i'll be able to avoid MemoryStream completely and directly write to the database. Thanks a lot!

jstedfast · 2018-06-19T17:15:36Z

No problem :)

jstedfast closed this as completed in jstedfast/MimeKit@fb4823c Jun 18, 2018

jstedfast added a commit to jstedfast/MimeKit that referenced this issue Jun 19, 2018

Use an ArrayPool tailored toward MemoryBlockStream instead

33aa2c9

Fixes jstedfast/MailKit#725

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High memory traffic from MemoryBlockStream #725

High memory traffic from MemoryBlockStream #725

ekalchev commented Jun 18, 2018 •

edited

Loading

jstedfast commented Jun 18, 2018

jstedfast commented Jun 18, 2018

ekalchev commented Jun 19, 2018 •

edited

Loading

jstedfast commented Jun 19, 2018

jstedfast commented Jun 19, 2018

ekalchev commented Jun 19, 2018 •

edited

Loading

jstedfast commented Jun 19, 2018

ekalchev commented Jun 19, 2018 •

edited

Loading

ekalchev commented Jun 19, 2018 •

edited

Loading

jstedfast commented Jun 19, 2018

ekalchev commented Jun 19, 2018

jstedfast commented Jun 19, 2018

High memory traffic from MemoryBlockStream #725

High memory traffic from MemoryBlockStream #725

Comments

ekalchev commented Jun 18, 2018 • edited Loading

jstedfast commented Jun 18, 2018

jstedfast commented Jun 18, 2018

ekalchev commented Jun 19, 2018 • edited Loading

jstedfast commented Jun 19, 2018

jstedfast commented Jun 19, 2018

ekalchev commented Jun 19, 2018 • edited Loading

jstedfast commented Jun 19, 2018

ekalchev commented Jun 19, 2018 • edited Loading

ekalchev commented Jun 19, 2018 • edited Loading

jstedfast commented Jun 19, 2018

ekalchev commented Jun 19, 2018

jstedfast commented Jun 19, 2018

ekalchev commented Jun 18, 2018 •

edited

Loading

ekalchev commented Jun 19, 2018 •

edited

Loading

ekalchev commented Jun 19, 2018 •

edited

Loading

ekalchev commented Jun 19, 2018 •

edited

Loading

ekalchev commented Jun 19, 2018 •

edited

Loading