Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce memory and CPU costs due to SegmentedList usage #75661

Merged

Conversation

ToddGrun
Copy link
Contributor

@ToddGrun ToddGrun commented Oct 29, 2024

Currently, the SegmentedList class suffers from two potential inefficiencies:

  1. Upon growth, it doubles the SegmentedArray size. This is necessary for normal List like collections to get constant time amortized growth, but isn't necessary for segmented data structures.
  2. Upon growth, it reallocates and copies over the existing pages.

This PR addresses 2).
Instead, if we reuse existing segments when growing the list, we can save significant CPU and allocation costs.

*** Benchmark.NET data ***
OLD:
image

NEW:
image

Currently, the SegmentedList class suffers from two inefficiencies:

1) Upon growth, it doubles the SegmentedArray size. This is necessary for normal List like collections to get constant time amortized growth, but isn't necessary (or desirable) for segmented data structures.
2) Upon growth, it reallocates and copies over the existing pages.

Instead, if we only allocate the modified/new pages and the array holding the pages, we can save significant CPU and allocation costs.
@dotnet-issue-labeler dotnet-issue-labeler bot added Area-IDE untriaged Issues and PRs which have not yet been triaged by a lead labels Oct 29, 2024
@CyrusNajmabadi
Copy link
Member

Doubling makes sense when you only have a single backing store, and exceeding its limits causes another allocation and full copy.

Given that we're already segmented, and growth shouldn't cause copies of prior segments, this seems like a sensible change to me.

Would like perf numbers if possible though.

@CyrusNajmabadi
Copy link
Member

Oh. I didn't scroll down far enough.

Copy link
Member

@CyrusNajmabadi CyrusNajmabadi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm approving the concept. Haven't looked deeply at the code change yet.

Copy link
Member

@CyrusNajmabadi CyrusNajmabadi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks ok. I think dedicated tests would be good.

src/Dependencies/Collections/SegmentedArray`1.cs Outdated Show resolved Hide resolved
src/Dependencies/Collections/SegmentedArray`1.cs Outdated Show resolved Hide resolved
src/Dependencies/Collections/SegmentedList`1.cs Outdated Show resolved Hide resolved
@@ -502,7 +529,17 @@ internal void Grow(int capacity)
// If the computed capacity is still less than specified, set to the original argument.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requested changes:

  1. Document the algorithm for initialization of newCapacity
  2. For initial length greater than or equal to half the segment size but less than one full segment size, set newCapacity to one segment size.
  3. For any initial length where the final segment is not a full segment, set newCapacity to the length it would be with a full-size final segment. This guarantees that the outer array will not be reallocated, and also guarantees that the single inner array allocation performed during the resize will not need to be performed a second time during the next resize. (Can be modified and treated as a generalization of the preceding point)
  4. If the calculated newCapacity ends up being less than capacity and capacity is greater than the segment size, apply a final ceiling operation so the final segment is full size.

These rules are all relevant regardless of whether we increase by doubling or by a page at a time. I am still reviewing the choice of expansion size.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

➡️ Changing the capacity selection algorithm for resize has been deferred to a later pull request.

@ToddGrun ToddGrun marked this pull request as ready for review October 31, 2024 16:55
@ToddGrun ToddGrun requested review from a team as code owners October 31, 2024 16:55
Copy link
Member

@jaredpar jaredpar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed compiler test code

@ToddGrun ToddGrun merged commit 6217b5c into dotnet:main Nov 1, 2024
28 checks passed
@dotnet-policy-service dotnet-policy-service bot added this to the Next milestone Nov 1, 2024
ToddGrun added a commit to ToddGrun/roslyn that referenced this pull request Nov 16, 2024
Mimics changes done in dotnet#75661 to enable segment reuse in SegmentedDictionary.
ToddGrun added a commit that referenced this pull request Nov 21, 2024
* Reuse segments during SegmentedDictionary growth

Mimics changes done in #75661 to enable segment reuse in SegmentedDictionary.
@jjonescz jjonescz modified the milestones: Next, 17.13 P2 Nov 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-IDE untriaged Issues and PRs which have not yet been triaged by a lead
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants