Skip to content
This repository has been archived by the owner on Mar 3, 2023. It is now read-only.

Limit number of messages in mempool, instead of total byte sizes #2073

Merged
merged 8 commits into from
Jul 19, 2017

Conversation

objmagic
Copy link
Contributor

@objmagic objmagic commented Jul 17, 2017

This PR limits size of memory pool by limiting number of messages in each type of memory pool, instead of bytes used.

Motivation

In the past two weeks, team at Twitter was investigating huge memory usage of stream manager of one topology. We printed out number of messages in memory pool, and sum of messages' size using SpaceUsed instead of sizeof. Thanks @ttim for pointing this out and I've confirmed with @congwang as well. We saw memory pool usage of HeronTupleSet2 can reach more than 70MB. This confirms that using sizeof is wrong, and if we want to limit memory pool usage, we should use SpaceUsed instead. To be more specific here, google::protobuf::Message allocates memory on heap, and has a fairly complex structure, which is why we should use SpaceUsed provided by library to know its size, instead of sizeof.

Why not using SpaceUsed

As documentation of SpaceUsed points out, each call can be quite expensive. Of course, one can argue to use a thread to update mempool usage continuously, but there will be thread-safety issue and seems unnecessarily complicated.

Why use size directly

This PR simply limits number of messages in each memory pool. Advantage here is that implementation is simple and release call can finish in O(1).

Why set maximum to be 512

Because based on what observed from the topology at Twitter, we see max number of messages in memory pool can be around ~630. It is reasonable to limit pool size to be 512. I am thinking to limit the size more here. Maybe 256? Please comment.

Side notes

  1. Message->Clear() does not shrink Message. See here
  2. Why Message can become bloated gradually? See here

This should resolve #1567 (cc @ajorgensen )

@objmagic objmagic added this to the 0.15.0 milestone Jul 17, 2017
@srkukarni
Copy link
Contributor

In principle I like this change, however I think you should add a heron internals config variable to control this parameter?

@objmagic
Copy link
Contributor Author

@srkukarni Thanks. Add configuration variable takes time because there is circular dependency problems.

@@ -21,5 +21,5 @@

// TODO(nlu): get the pool size limit from config
MemPool<google::protobuf::Message>* __global_protobuf_pool__ =
new MemPool<google::protobuf::Message>(50 * 1024 * 1024);
new MemPool<google::protobuf::Message>(512);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this review looks like you're just calculating size of the pool differently, but what causes the pool size logic to change from being based on number of bytes to number of message in the pool impl?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is explained above: #2073 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Looks like your changing the 512 to be configurable. Shouldn't the config-driven value be used here now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is config driven. The initial value is 512, and will be overridden here: https://github.com/twitter/heron/pull/2073/files#diff-5ff5ed00c50afd2302f522a9181e50a1R90

As for why initial value cannot be read from config, see the discussion between Sanjeev and I about circular dependency.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Could you update the TODO nlu comment above to explain why this is config-driven, but hard coded here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@billonahill Updated here: 01acca8

@srkukarni
Copy link
Contributor

@objmagic could you please elaborate what these cir dep are?

@objmagic
Copy link
Contributor Author

objmagic commented Jul 17, 2017

@srkukarni config-cxx -> network-cxx -> basics-cxx. Inside basics-cxx, mempool.h/cpp needs to use heron-internals-config-reader., which is in config-cxx. Thus circular dependency.

@objmagic
Copy link
Contributor Author

objmagic commented Jul 17, 2017

@srkukarni I have added a method to set limit of memory pool externally. We then set limit of memory pool in stream manager's Init. So we can avoid the circular dependency problem here.

@srkukarni
Copy link
Contributor

There might be ways to avoid this. One way is to actually send this parameter as part of the init method. Or have an explicit InitBuffer pool method inside main of stmgr/tmaster.

@srkukarni
Copy link
Contributor

@objmagic Sorry, looks like our comments crossed each other. I see that you have done it. Let me take a look at this. Thanks!

@@ -65,6 +65,9 @@ heron.streammgr.cache.drain.size.mb: 100
# For efficient acknowledgements
heron.streammgr.xormgr.rotatingmap.nbuckets: 3

# The max size of the memory pool for any type of message
heron.streammgr.mempool.size.mb: 512
Copy link
Member

@nlu90 nlu90 Jul 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@objmagic

Is this config for limiting the # of messages in the mempool or the total byte size in this mempool? This config is different than the other above 9 config.

Could you make them consistent and update the comment and even the config name to avoid confusion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh you are right. this is typo.

Copy link
Member

@nlu90 nlu90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please check my above comments

@congwang
Copy link
Contributor

This is still ugly. The memory leaks are leaks, adding a limit is just adding a cap to these leaks, doesn't resolve them at all. How would you expect users to figure out how many messages they actually need?

The real solution here is probably to explore the new protobuf arena API to see how we can integrate it with mempool. But that requires protobuf 3.0 at least, fortunately I already have this here: https://github.com/twitter/heron/tree/cwang/protobuf30

@srkukarni
Copy link
Contributor

@congwang unless I'm mistaken, afaik, we are not talking about memory leak here. Its just about restricting the size of mempool from growing and remaining high for a long time.

@objmagic
Copy link
Contributor Author

objmagic commented Jul 18, 2017

@congwang I agree. This solution does not look nice to me either.

@srkukarni @congwang I have a theory about the cause of "memory leak". If one does not call Clear on Message, used space in Message will not be zero'ed out. So next time we add tuple in it, Message thinks it does not have enough space and will enlarge the strings in itself. C++ STL will reserve extra space in string when you append (just like how it uses some heuristics to reserve extra space for vector). Thus we are always allocating extra memory here.

If my theory is correct, #2057 should solve the issues, along with this PR.

See our heap leak check graph below and backtrace we get from core dump for details.
fake85

@srkukarni
Copy link
Contributor

@objmagic Thats not true. If you notice, acquire does a Clear before giving the proto structure to the client, so Clear is called.
You can do one expt. Remove the Clear from acquire to release.

@ttim
Copy link

ttim commented Jul 18, 2017

@srkukarni @objmagic protobuf documentation says about this (https://developers.google.com/protocol-buffers/docs/cpptutorial, Optimization Tips):

Reuse message objects when possible. Messages try to keep around any memory they allocate for reuse, even when they are cleared. Thus, if you are handling many messages with the same type and similar structure in succession, it is a good idea to reuse the same message object each time to take load off the memory allocator. However, objects can become bloated over time, especially if your messages vary in "shape" or if you occasionally construct a message that is much larger than usual. You should monitor the sizes of your message objects by calling the SpaceUsed method and delete them once they get too big.

So even if you do Clear you still have some kind of memory leak (and therefore assumption @objmagic made still make sense).

Having limited number of objects in pool can fix this issue thou: they will be rotated over the time so most likely each message eventually will be deleted. Having big enough pool at the same time almost guarantee all messages will only become bigger and memory will keep increasing.

@objmagic
Copy link
Contributor Author

@srkukarni You are right. Before your PR, Clear is already called properly for all messages.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Increased stream manager memory usage
7 participants