-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce a slab-based memory allocator and optimize SHAMapItem #4218
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In perftesting with NFTs, this commit reduces memory size by about 15% sustained over long duration.
LGTM
👍
src/ripple/shamap/SHAMapItem.h
Outdated
assert(data.size() <= megabytes<std::size_t>(64)); | ||
|
||
std::uint8_t* raw = [&data]() -> std::uint8_t* { | ||
if (data.size() < 128) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these be <= instead of < ? If not, what's the extra byte in the slab for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of notes:
- The clang macOS library does not support
std::aligned_alloc
yet. As you noted, that can be patched by#include <stdlib.h>
and usingaligned_alloc
. That patch is required in two files: SlabAllocator.h and SHAMapItem.h - Once that patch is made, you can remove the
#include <cstdlib>
in SlabAllocator.h, at least with clang on macOS. - Clang warns me that the
std::move
on line 101 of SHAMapSync_test.cpp is a pessimization. I removed thatstd::move
, since I build with warnings as errors. - Once those changes are made, Immediately upon invoking the unit tests I get an assert:
Assertion failed: (p_ != nullptr), function SlabAllocator, file /Users/scott/Development/rippled/src/ripple/basics/SlabAllocator.h, line 126.
Abort trap: 6
That assert suggests using non-std
aligned_alloc
may not work as well as hoped.
That's all I have for now. I'll look more once those comments are addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet working on macOS, see comments.
I thought I saw a fairly recent push to this pull request. So I tried building it. The previously observed build failures on macOS are still present. |
@scottschurr the following is Nik's branch with my my additional changes which should build https://github.com/greg7mdp/rippled/tree/slab_alloc |
@greg7mdp I think you should open a new PR with all of the changes, and @nbougalis should close this one. What do you think, @HowardHinnant & @scottschurr ? |
I have a slight preference for keeping this one open so I can remember what I said 19 days ago. But I won't object if everyone wants to move to a new PR. |
@mtrippled I also think it would make sense to create a new PR. I suggested it a few days back, but @nbougalis said he would take care of it, probably he'd like to review the branch first. |
@greg7mdp, I'm fine either way, either updating this pull request or creating a new one. |
And, @greg7mdp, I tried building your branch locally. Yup, your version works way better on macOS than the previous version did. Thanks! I think I will wait until your version is in an official pull request, however, before I start a full fledged review of the revised code. |
Thanks @scottschurr for trying the mac version. I did build my branch on all 3 platforms (linux, windows, mac). |
I've addressed the concerns and issues highlighted by reviewers. I have also incorporated some of the ideas by @greg7mdp into I believe that this is ready for another review and, provided it's signed-off, a good candidate for merging. I expect a reduction in overall memory usage of at least 15% on mainnet servers, but it depends on a number of factors. On my test bed, the reduction is closer to 22%. Performance should also improve. Thanks. |
When instantiating a large amount of fixed-sized objects on the heap the overhead that dynamic memory allocation APIs impose will quickly become significant. In some cases, allocating a large amount of memory at once and using a slabbing allocator to carve the large block into fixed-sized units that are used to service requests for memory out will help to reduce memory fragmentation significantly and, potentially, improve overall performance. This commit introduces a new `SlabAllocator<>` class that exposes an API that is _similar_ to the C++ concept of an `Allocator` but it is not meant to be a general-purpose allocator. A helper class, `SlabAllocatorSet<>` simplifies handling of variably sized objects that benefit from slab allocations. This commit contains significant improvements over the original, all of which were suggested by Greg Popovich (@greg7mdp on GitHub).
The `SHAMapItem` class contains a variable-sized buffer that holds the serialized data associated with a particular item inside a `SHAMap`. Prior to this commit, the buffer for the serialized data was allocated separately. Coupled with the fact that most instances of `SHAMapItem` were wrapped around a `std::shared_ptr` meant that an instantiation might result in up to three separate memory allocations. This commit switches away from `std::shared_ptr` for `SHAMapItem` and uses `boost::intrusive_ptr` instead, allowing the reference count for an instance to live inside the instance itself. Coupled with using a slab-based allocator to optimize memory allocation for the most commonly sized buffers, the net result is significant memory savings. In testing, the reduction in memory usage hovers between 400MB and 650MB. Other scenarios might result in larger savings.
The `Ledger` class contains two `SHAMap` instances: the state and transaction maps. Prior to this commit, the maps were dynamically allocated using `std::make_shared` despite the fact that they did not require lifetime management separate from the lifetime of the `Ledger` instance to which they belong. The two `SHAMap` instances are now regular member variables and a number of other instances where smart pointers and dynamic memory allocation could be avoided by using stack-based alternatives are also changed.
@scottschurr indicated he has no particular need to review this PR, and that it would be best to assign a different reviewer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of my comments are resolved.
This builds and unittests for me using the latest macOS tools.
…ougalis-shamapitemslab
I merged
|
I think that the three commits are functionally separate, and each is self-contained. IMO keeping them as-is is cleaner, more professional and makes it easier to review and, if needed, run |
We had a comparable situation with #4192: the PR author recommended keeping the commits as-is, and so we did. Later, we learned there was some discontent with that, and that we should have squashed all the commits together. I don't personally have any opinion either way, but if we're going to recommend merging this as three separate commits, then we should at least confirm that each commit stands on its own and passes unit tests & clang-format. It may make sense to open 2-3 new PRs for this purpose. |
What we decided in #4192 is that if reviewers feel that a commit should be merged, and the commit message does not begin with |
blocked: let's get some agreement on whether this PR should end up as 3 commits, or 1. If it's 3 commits, I think it would be best to open a couple of separate PRs the individual parts, so that we can confirm they are mergable on their own (they make sense and pass unit tests & clang-format). |
When instantiating a large amount of fixed-sized objects on the heap the overhead that dynamic memory allocation APIs impose will quickly become significant. In some cases, allocating a large amount of memory at once and using a slabbing allocator to carve the large block into fixed-sized units that are used to service requests for memory out will help to reduce memory fragmentation significantly and, potentially, improve overall performance. This commit introduces a new `SlabAllocator<>` class that exposes an API that is _similar_ to the C++ concept of an `Allocator` but it is not meant to be a general-purpose allocator. It should not be used unless profiling and analysis of specific memory allocation patterns indicates that the additional complexity introduced will improve the performance of the system overall, and subsequent profiling proves it. A helper class, `SlabAllocatorSet<>` simplifies handling of variably sized objects that benefit from slab allocations. This commit incorporates improvements suggested by Greg Popovitch (@greg7mdp). Commit 1 of 3 in #4218.
When instantiating a large amount of fixed-sized objects on the heap the overhead that dynamic memory allocation APIs impose will quickly become significant. In some cases, allocating a large amount of memory at once and using a slabbing allocator to carve the large block into fixed-sized units that are used to service requests for memory out will help to reduce memory fragmentation significantly and, potentially, improve overall performance. This commit introduces a new `SlabAllocator<>` class that exposes an API that is _similar_ to the C++ concept of an `Allocator` but it is not meant to be a general-purpose allocator. It should not be used unless profiling and analysis of specific memory allocation patterns indicates that the additional complexity introduced will improve the performance of the system overall, and subsequent profiling proves it. A helper class, `SlabAllocatorSet<>` simplifies handling of variably sized objects that benefit from slab allocations. This commit incorporates improvements suggested by Greg Popovitch (@greg7mdp). Commit 1 of 3 in #4218.
The `SHAMapItem` class contains a variable-sized buffer that holds the serialized data associated with a particular item inside a `SHAMap`. Prior to this commit, the buffer for the serialized data was allocated separately. Coupled with the fact that most instances of `SHAMapItem` were wrapped around a `std::shared_ptr` meant that an instantiation might result in up to three separate memory allocations. This commit switches away from `std::shared_ptr` for `SHAMapItem` and uses `boost::intrusive_ptr` instead, allowing the reference count for an instance to live inside the instance itself. Coupled with using a slab-based allocator to optimize memory allocation for the most commonly sized buffers, the net result is significant memory savings. In testing, the reduction in memory usage hovers between 400MB and 650MB. Other scenarios might result in larger savings. In performance testing with NFTs, this commit reduces memory size by about 15% sustained over long duration. Commit 2 of 3 in #4218.
@seelabs confirmed 3 commits for this is fine. I am rebasing the commits individually to ensure they pass CI on GitHub. If they pass, I will merge them no earlier than 2023-04-11 at 8:00 AM PT. If you have any objections, please comment here on this PR. |
Suggested commit message:
|
will merge after github checks pass, if I don't hear any objections |
Misaligned load and store operations are supported by both Intel and ARM CPUs. However, in C++, these operations are undefined behavior (UB). Substituting these operations with a `memcpy` fixes this UB. The compiled assembly code is equivalent to the original, so there is no performance penalty to using memcpy. For context: The unaligned load and store operations fixed here were originally introduced in the slab allocator (#4218).
@manojsdoshi to try to corroborate the 15-22% memory use reduction Released in 1.11.0. |
Introduce support for a slabbed allocator:
When instantiating a large amount of fixed-sized objects on the heap the overhead that dynamic memory allocation APIs impose will quickly become significant.
In some cases, allocating a large amount of memory at once and using a slabbing allocator to carve the large block into fixed-sized units that are used to service requests for memory out will help to reduce memory fragmentation significantly and, potentially, improve overall performance.
This commit introduces a new SlabAllocator<> class that wraps around the complexity and exposes an API that is similar to the C++ concept of an Allocator.
This class is not a general-purpose allocator and should not be used unless profiling and analysis of specific memory allocation patterns indicates that the additional complexity introduced will improve the performance of the system overall and subsequenting profiling proves that.
Optimize SHAMapItem and leverage new slab allocator:
The
SHAMapItem
class contains a variable-sized buffer that holds the serialized data associated with a particular item inside aSHAMap
.Prior to this commit, the buffer for the serialized data was allocated separately. Coupled with the fact that most instances of
SHAMapItem
were wrapped around astd::shared_ptr
meant that an instantiation might result in up to three separate memory allocations.This commit switches away from
std::shared_ptr
forSHAMapItem
and usesboost::intrusive_ptr
instead, allowing the reference count for an instance to live inside the instance itself. Coupledwith using a slab-based allocator to optimize memory allocation for the most commonly sized buffers, the net result is significant memory savings. In testing, the reduction in memory usage hovers between 400MB and 650MB. Other scenarios might result in larger savings.
I expect a reduction in overall memory usage of at least 15% on mainnet servers, but it depends on a number of factors. On my test bed, the reduction is closer to 22%. Performance should also improve.
High Level Overview of Change
Context of Change
Type of Change