-
Notifications
You must be signed in to change notification settings - Fork 593
Limit number of messages in mempool, instead of total byte sizes #2073
Conversation
In principle I like this change, however I think you should add a heron internals config variable to control this parameter? |
@srkukarni Thanks. Add configuration variable takes time because there is circular dependency problems. |
@@ -21,5 +21,5 @@ | |||
|
|||
// TODO(nlu): get the pool size limit from config | |||
MemPool<google::protobuf::Message>* __global_protobuf_pool__ = | |||
new MemPool<google::protobuf::Message>(50 * 1024 * 1024); | |||
new MemPool<google::protobuf::Message>(512); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this review looks like you're just calculating size of the pool differently, but what causes the pool size logic to change from being based on number of bytes to number of message in the pool impl?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is explained above: #2073 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Looks like your changing the 512 to be configurable. Shouldn't the config-driven value be used here now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is config driven. The initial value is 512, and will be overridden here: https://github.com/twitter/heron/pull/2073/files#diff-5ff5ed00c50afd2302f522a9181e50a1R90
As for why initial value cannot be read from config, see the discussion between Sanjeev and I about circular dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Could you update the TODO nlu comment above to explain why this is config-driven, but hard coded here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@billonahill Updated here: 01acca8
@objmagic could you please elaborate what these cir dep are? |
@srkukarni |
@srkukarni I have added a method to set limit of memory pool externally. We then set limit of memory pool in stream manager's |
There might be ways to avoid this. One way is to actually send this parameter as part of the init method. Or have an explicit InitBuffer pool method inside main of stmgr/tmaster. |
@objmagic Sorry, looks like our comments crossed each other. I see that you have done it. Let me take a look at this. Thanks! |
@@ -65,6 +65,9 @@ heron.streammgr.cache.drain.size.mb: 100 | |||
# For efficient acknowledgements | |||
heron.streammgr.xormgr.rotatingmap.nbuckets: 3 | |||
|
|||
# The max size of the memory pool for any type of message | |||
heron.streammgr.mempool.size.mb: 512 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this config for limiting the # of messages in the mempool or the total byte size in this mempool? This config is different than the other above 9 config.
Could you make them consistent and update the comment and even the config name to avoid confusion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh you are right. this is typo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please check my above comments
This is still ugly. The memory leaks are leaks, adding a limit is just adding a cap to these leaks, doesn't resolve them at all. How would you expect users to figure out how many messages they actually need? The real solution here is probably to explore the new protobuf arena API to see how we can integrate it with mempool. But that requires protobuf 3.0 at least, fortunately I already have this here: https://github.com/twitter/heron/tree/cwang/protobuf30 |
@congwang unless I'm mistaken, afaik, we are not talking about memory leak here. Its just about restricting the size of mempool from growing and remaining high for a long time. |
@congwang I agree. This solution does not look nice to me either. @srkukarni @congwang I have a theory about the cause of "memory leak". If one does not call If my theory is correct, #2057 should solve the issues, along with this PR. See our heap leak check graph below and backtrace we get from core dump for details. |
@objmagic Thats not true. If you notice, acquire does a Clear before giving the proto structure to the client, so Clear is called. |
@srkukarni @objmagic protobuf documentation says about this (https://developers.google.com/protocol-buffers/docs/cpptutorial, Optimization Tips):
So even if you do Having limited number of objects in pool can fix this issue thou: they will be rotated over the time so most likely each message eventually will be deleted. Having big enough pool at the same time almost guarantee all messages will only become bigger and memory will keep increasing. |
@srkukarni You are right. Before your PR, |
This PR limits size of memory pool by limiting number of messages in each type of memory pool, instead of bytes used.
Motivation
In the past two weeks, team at Twitter was investigating huge memory usage of stream manager of one topology. We printed out number of messages in memory pool, and sum of messages' size using
SpaceUsed
instead ofsizeof
. Thanks @ttim for pointing this out and I've confirmed with @congwang as well. We saw memory pool usage of HeronTupleSet2 can reach more than 70MB. This confirms that usingsizeof
is wrong, and if we want to limit memory pool usage, we should useSpaceUsed
instead. To be more specific here,google::protobuf::Message
allocates memory on heap, and has a fairly complex structure, which is why we should useSpaceUsed
provided by library to know its size, instead ofsizeof
.Why not using
SpaceUsed
As documentation of
SpaceUsed
points out, each call can be quite expensive. Of course, one can argue to use a thread to update mempool usage continuously, but there will be thread-safety issue and seems unnecessarily complicated.Why use
size
directlyThis PR simply limits number of messages in each memory pool. Advantage here is that implementation is simple and release call can finish in
O(1)
.Why set maximum to be 512
Because based on what observed from the topology at Twitter, we see max number of messages in memory pool can be around ~630. It is reasonable to limit pool size to be 512. I am thinking to limit the size more here. Maybe 256? Please comment.
Side notes
Message->Clear()
does not shrinkMessage
. See hereMessage
can become bloated gradually? See hereThis should resolve #1567 (cc @ajorgensen )