Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High memory traffic from MemoryBlockStream #725

Closed
ekalchev opened this issue Jun 18, 2018 · 12 comments
Closed

High memory traffic from MemoryBlockStream #725

ekalchev opened this issue Jun 18, 2018 · 12 comments

Comments

@ekalchev
Copy link
Contributor

ekalchev commented Jun 18, 2018

I am using MailKit for about an year and I notice memory spikes from time to time. In this particular case I am downloading full messages (header + message parts) from a local mail server (high bandwidth).
image

The process memory jumps from 300mb to 1gb for about 2-3 sec. I can see with the memory profiler the allocations are byte arrays of size 2048 bytes. They are allocated in MemoryBlockStream. Those allocations are not memory leaks as you can notice at 51s I forced GC and everything is gone. Still having so much memory traffic is not very good.

I can see that each MemoryBlockStream holds a private list of byte arrays that are thrown away after the stream is disposed.

What if a shared pool of byte arrays is used instead of allocating them for each stream?

https://github.com/dotnet/corefx/blob/master/src/System.Buffers/src/System/Buffers/ArrayPool.cs

or

replace entirely MemoryBlockStream with https://github.com/Microsoft/Microsoft.IO.RecyclableMemoryStream

@jstedfast
Copy link
Owner

Can you try these methods yourself and report back to me on how they perform? I don't have access to tooling to compare (one of the downsides of developing on a Mac).

Thanks!

@jstedfast
Copy link
Owner

I've got a patch to use ArrayPools that I'm about to land. Perhaps you can test it. If it doesn't solve the problem, could you try replacing uses of MemoryBlockStream inside of MimeParser.cs (and in ImapFolder.cs?) and see if that solves the problem?

@ekalchev
Copy link
Contributor Author

ekalchev commented Jun 19, 2018

The account I am testing with have a series of email with big attachment. I still see memory peaks but this time it is not caused by MemoryBlockStream.

image

You can see these peaks comes from MemoryStream.set_Capacity. It seems I have very big attachments and the memory stream keep resizing itself while writing
image

I think using RecycleableMemoryStream instead of MemoryStream (or MemoryBlockStream?) will solve the set_Capacity issue. However I am not sure how easy is to do that. I remember you have some checks in MailKit/MimeKit code like:

if (stream is MemoryStream) do not dispose.

And RecycleableMemoryStream inherit Memory Stream but it must be disposed to return the memory back to the pool.

As for ArrayPool - it is doing its jobs. ArrayPool support EventSource and I can see the buffers are pool as expected.

image

However when the series of mail messages with big attachment come I see this

image

What happens is that the array pool cannot rent anymore byte arrays and it start allocating new array without using pooling. ArrayPool.Create can create custom pools with user requested size. It would be very nice if you allow somehow the user to be able to supply it own array pool other than ArrayPool.Shared.

OR

Since MemoryBlockStream uses 2048 byte chunks a custom memory pool can be create like ArrayPool.Create(2048, 200); that will allocate 400k of memory which should be enough for the most scenarios. For bigger mime parts I guess allocating new arrays is unavoidable.

jstedfast added a commit to jstedfast/MimeKit that referenced this issue Jun 19, 2018
@jstedfast
Copy link
Owner

Regarding MemoryStream.set_Capacity, that's being invoked when you write a message to a stream. The target stream is a MemoryStream. That's not related to parsing the message.

Can you find out where messages are being written to a MemoryStream and replace that logic with writing to a MemoryBlockStream? That will likely solve a huge chunk of your problem. If you are using message.ToString(), don't use that, use message.WriteTo (customStream);

As far as using ArrayPool.Shared, I agree, using a custom ArrayPool was something I was already considering doing but was waiting to hear back how this trial worked out. I was going to go with 100 chunks instead of 200, but meh, 200 is fine too. Any value we use is going to just be guessing anyway.

@jstedfast
Copy link
Owner

Question on ArrayPool: How many buckets does an ArrayPool allocate? Just the 1? Or will it create a number of buckets as it needs them but just not go above a hard-coded threshold?

@ekalchev
Copy link
Contributor Author

ekalchev commented Jun 19, 2018

I ran the DefaultArrayPool contructor because it is hard to read the bit operations they have

For these values

var maxArrayLength = 2048;
var maxArraysPerBucket = 200;

8 buckets are generated.

Bucket 0 = 16
Bucket 1 = 32
Bucket 2 = 64
Bucket 3 = 128
Bucket 4 = 256
Bucket 5 = 512
Bucket 6 = 1024
Bucket 7 = 2048

Buckets are only containers for which arrays will be allocated, so only Bucket 7 will be used by MemoryBlockStream

As for the maxArraysPerBucket - this will be a good idea to be configurable. For embedded systems - 2048 * 200 ~ 400kb may be is too much. The problem with that size is it will never be released by the array pool.

@jstedfast
Copy link
Owner

Ah, so that's what they mean by buckets. Ok. It sounds to me, then, like ArrayPool() is not the ideal solution to this problem because all other array sizes will go unused.

I think a better solution will be to use something like an ArrayPool, but with a single buffer size.

@ekalchev
Copy link
Contributor Author

ekalchev commented Jun 19, 2018

Keep in mind those arrays in the buckets are not pre-allocated. When array is requested with Rent, only then the array is created and added in the bucket. MemoryBlockStream will request only byte arrays of 2048 bytes and other buckets will stay empty. No memory will be wasted in other buckets.

Here the bucket code

https://github.com/dotnet/corefx/blob/bffef76f6af208e2042a2f27bc081ee908bb390b/src/System.Buffers/src/System/Buffers/DefaultArrayPoolBucket.cs

@ekalchev
Copy link
Contributor Author

ekalchev commented Jun 19, 2018

Regarding MemoryStream.set_Capacity

I am using ImapFolder.GetStreams method and then I construct MimeMessage with MimeMessage.Load for each stream. Then I am using custom MimeVisitor to get all the message parts as MimeEntity.

Then for every MimeEntity i am calling

mimeEntity.Headers.WriteTo
mimeEntity.WriteTo

to save header and body in my database. MemoryStream.set_Capacity is called from WriteTo methods. Is there more efficient way to do that?

@jstedfast
Copy link
Owner

Well, both of those WriteTo() methods take a Stream argument. It doesn't have to be a MemoryStream. You can use whatever stream type that you want. Seems like you should use something else...

Another option might be to use a MimeKit.IO.MeasuringStream to write it to that first to measure how much buffer space you need, then allocate a MemoryStream with that capacity, then call WriteTo() again on the MemoryStream?

It might be slower since you're writing the headers/mime parts twice, but the penalty for that might be less than the penalty to realloc the MemoryStream's buffer and copying all the data to the new internal buffer.

@ekalchev
Copy link
Contributor Author

Hmm you are right when I saw the call stack I thought that MailKit is doing some internal allocations but the problem is the memory stream I am passing as argument.

MeasuringStream will be really useful for me. I am using SQLite and you can open a stream to SQLite blob column and directly write to the disk. The problem I had is that I didn't know what will be the output size which is required when creating SQLite blob column. This way i'll be able to avoid MemoryStream completely and directly write to the database. Thanks a lot!

@jstedfast
Copy link
Owner

No problem :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants