Update the generic collection types to internally use GC.AllocateUninitializedArray<T> #47186

tannergooding · 2021-01-19T19:03:37Z

As per the title, this updates the generic collection types to internally use GC.AllocateUninitializedArray.

When T is a reference type, this should not differ from the standard new T[] as the API ensures the array is zeroed anyways for correctness.
However, when T is a value type and particularly when the size of T or the collection is large, this can help avoid needless zeroing in the underlying APIs.

Most of the collection types are already optimized to only zero removed elements if T is a reference type and internally track and check their bounds so this should be completely safe.

For a simple test, such as System.Collections.CtorGivenSize.List which creates a List with a capacity of 512, this shows some small gains:

BenchmarkDotNet=v0.12.1.1466-nightly, OS=Windows 10.0.19042
AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK=6.0.100-alpha.1.21056.1
  [Host]     : .NET 6.0.0 (6.0.21.5406), X64 RyuJIT
  Job-RSPOMU : .NET 6.0.0 (42.42.42.42424), X64 RyuJIT
  Job-ECGLUA : .NET 6.0.0 (42.42.42.42424), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  Arguments=/p:DebugType=portable  IterationTime=250.0000 ms
MaxIterationCount=20  MinIterationCount=15  WarmupCount=1

Method	Job	Toolchain	Size	Mean	Error	StdDev	Median	Min	Max	Ratio	RatioSD	Gen 0	Gen 1	Gen 2	Allocated
List	Job-RSPOMU	\runtime\artifacts\bin\testhost\net6.0-windows-Release-x64\shared\Microsoft.NETCore.App\6.0.0\CoreRun.exe	512	65.10 ns	0.783 ns	0.694 ns	65.12 ns	64.24 ns	66.48 ns	1.37	0.04	0.1256	0.0008	-	2 KB
List	Job-ECGLUA	\runtime_base\artifacts\bin\testhost\net6.0-windows-Release-x64\shared\Microsoft.NETCore.App\6.0.0\CoreRun.exe	512	47.48 ns	1.022 ns	1.136 ns	47.29 ns	46.09 ns	50.31 ns	1.00	0.00	0.1256	0.0019	-	2 KB

For string, it is likewise showing essentially no difference:

Method	Job	Toolchain	Size	Mean	Error	StdDev	Median	Min	Max	Ratio	RatioSD	Gen 0	Gen 1	Gen 2	Allocated
List	Job-RSPOMU	\runtime\artifacts\bin\testhost\net6.0-windows-Release-x64\shared\Microsoft.NETCore.App\6.0.0\CoreRun.exe	512	98.22 ns	2.805 ns	2.881 ns	98.63 ns	93.72 ns	103.37 ns	1.04	0.04	0.2480	0.0072	-	4 KB
List	Job-ECGLUA	\runtime_base\artifacts\bin\testhost\net6.0-windows-Release-x64\shared\Microsoft.NETCore.App\6.0.0\CoreRun.exe	512	94.07 ns	1.643 ns	1.537 ns	94.72 ns	91.21 ns	96.11 ns	1.00	0.00	0.2481	0.0074	-	4 KB

I haven't run the full suite of performance tests, but similar gains should be seen for effectively any value type for the given collections.

…ializedArray<T>

…y<T>

ghost · 2021-01-19T19:03:45Z

Tagging subscribers to this area: @eiriktsarpalis
See info in area-owners.md if you want to be subscribed.

Issue Details

As per the title, this updates the generic collection types to internally use GC.AllocateUninitializedArray.

When T is a reference type, this should not differ from the standard new T[] as the API ensures the array is zeroed anyways for correctness.
However, when T is a value type and particularly when the size of T or the collection is large, this can help avoid needless zeroing in the underlying APIs.

Most of the collection types are already optimized to only zero removed elements if T is a reference type and internally track and check their bounds so this should be completely safe.

For a simple test, such as System.Collections.CtorGivenSize.List which creates a List with a capacity of 512, this shows some small gains:

BenchmarkDotNet=v0.12.1.1466-nightly, OS=Windows 10.0.19042
AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK=6.0.100-alpha.1.21056.1
  [Host]     : .NET 6.0.0 (6.0.21.5406), X64 RyuJIT
  Job-RSPOMU : .NET 6.0.0 (42.42.42.42424), X64 RyuJIT
  Job-ECGLUA : .NET 6.0.0 (42.42.42.42424), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  Arguments=/p:DebugType=portable  IterationTime=250.0000 ms
MaxIterationCount=20  MinIterationCount=15  WarmupCount=1

Method	Job	Toolchain	Size	Mean	Error	StdDev	Median	Min	Max	Ratio	RatioSD	Gen 0	Gen 1	Gen 2	Allocated
List	Job-RSPOMU	\runtime\artifacts\bin\testhost\net6.0-windows-Release-x64\shared\Microsoft.NETCore.App\6.0.0\CoreRun.exe	512	65.10 ns	0.783 ns	0.694 ns	65.12 ns	64.24 ns	66.48 ns	1.37	0.04	0.1256	0.0008	-	2 KB
List	Job-ECGLUA	\runtime_base\artifacts\bin\testhost\net6.0-windows-Release-x64\shared\Microsoft.NETCore.App\6.0.0\CoreRun.exe	512	47.48 ns	1.022 ns	1.136 ns	47.29 ns	46.09 ns	50.31 ns	1.00	0.00	0.1256	0.0019	-	2 KB

For string, it is likewise showing essentially no difference:

Method	Job	Toolchain	Size	Mean	Error	StdDev	Median	Min	Max	Ratio	RatioSD	Gen 0	Gen 1	Gen 2	Allocated
List	Job-RSPOMU	\runtime\artifacts\bin\testhost\net6.0-windows-Release-x64\shared\Microsoft.NETCore.App\6.0.0\CoreRun.exe	512	98.22 ns	2.805 ns	2.881 ns	98.63 ns	93.72 ns	103.37 ns	1.04	0.04	0.2480	0.0072	-	4 KB
List	Job-ECGLUA	\runtime_base\artifacts\bin\testhost\net6.0-windows-Release-x64\shared\Microsoft.NETCore.App\6.0.0\CoreRun.exe	512	97.07 ns	1.643 ns	1.537 ns	94.72 ns	91.21 ns	96.11 ns	1.00	0.00	0.2481	0.0074	-	4 KB

I haven't run the full suite of performance tests, but similar gains should be seen for effectively any value type for the given collections.

Author:	tannergooding
Assignees:	-
Labels:	`area-System.Collections`
Milestone:	-

tannergooding · 2021-01-19T19:03:46Z

CC. @stephentoub, @jkotas

GrabYourPitchforks · 2021-01-19T19:27:04Z

This may potentially turn race conditions into memory safety violations. Consider: multiple threads are improperly attempting to access a List<T>. With today's implementation, it's possible that they can corrupt the internal state of the List<T> and stomp on each others' data. This could result in some strange behaviors, such as a thread improperly reading default(T) or a stale T from the list. But it cannot possibly result in a thread reading an uninitialized T from the list.

With the new implementation, since the underlying array is now backed by potentially uninitialized memory, a racing thread may improperly read an uninitialized T from the underlying array. This is a possible memory safety violation because it may result in the disclosure of an arbitrary memory address's contents.

tannergooding · 2021-01-19T19:48:03Z

But it cannot possibly result in a thread reading an uninitialized T from the list.

It can because all of the types, modulo SortedSet, have code such as the following:

if (RuntimeHelpers.IsReferenceOrContainsReferences<T>())
{
    _items[_size] = default!;
}

if (RuntimeHelpers.IsReferenceOrContainsReferences<T>())
{
    Array.Clear(_items, freeIndex, _size - freeIndex); // Clear the elements so that the gc can reclaim the references.
}

etc

So we already have situations today where parts of the collection are uninitialized. This just makes it so that all elements start that way.

tannergooding · 2021-01-19T19:55:09Z

Ah, nevermind. I misinterpreted "uninitialized" here.

Is that actually a concern? Any user could define their own collection type using GC.AllocUninitializedArray, etc and could have the "same" side effect and the same view of memory.
Likewise with reading "stale" data from an existing collection type.

jkotas · 2021-01-19T20:04:22Z

System.Collections.CtorGivenSize.List

This micro-benchmark is not representative of actual user scenario. Creating an empty list and doing nothing with it is not interesting.

What is the improvement for the case you actually try to do something with the list?

Is this going to regress cases where you are creating small lists, e.g. list with just a few elements that are the 99% scenario for collections according to the telemetry?

tannergooding · 2021-01-19T20:24:12Z

Is this going to regress cases where you are creating small lists, e.g. list with just a few elements that are the 99% scenario for collections according to the telemetry?

I'll check....

At least for when T is a reference type it is optimized to be just new T[] and the JIT emits the following as expected (same codegen as new T[]):

mov rcx, 0xD1FFAB1E
mov edx, <size>
call CORINFO_HELP_NEWARR_1_OBJ

For when T is a value type, it looks like it generates less efficient code:

 mov      rcx, 0xD1FFAB1E
 call     CORINFO_HELP_TYPEHANDLE_TO_RUNTIMETYPE
 mov      gword ptr [rsp+20H], rax
 lea      rcx, bword ptr [rsp+20H]
 call     System.RuntimeTypeHandle:get_Value():long:this
 mov      rcx, rax
 mov      edx, <size>
 mov      r8d, 16
 call     System.GC:AllocateNewArray(long,int,int):System.Array

Likewise, if optimizations aren't enabled it is ultimately a call to AllocateUninitializedArray. However, I think both of these would be addressable by making the method an Intrinsic.
Given the gains on big collections of value types, particularly int, aren't uncommon, this seems like something worth pursuing.

jkotas · 2021-01-19T20:26:26Z

we already have situations today where parts of the collection are uninitialized.

Today, all parts of the collections are either zero-initialized or initialized to a valid value. This change makes it possible to see values that are complete garbage.

I agree with @GrabYourPitchforks that this weakens security defense-in-depth in the presence of race conditions.

GrabYourPitchforks · 2021-01-19T20:31:21Z

GC.AllocateUninitializedArray is an unsafe-equivalent API. When it's used as an internal implementation detail it's normally fine. If a user chooses to call this in their own code, it's on them to ensure safe usage. But if there's any chance that this implementation detail might leak to an external caller (such as the caller triggering a race condition), we need to proceed with caution.

There are some cases where we're ok with these implementation details possibly being exposed to external callers. ArrayPool<T>.Shared.Rent is the most prominent example. And it might be ok here as well if we're willing to say "you know what, it's only a bad thing if the caller already has a race condition, and we're willing to sacrifice some memory safety there in order to eke out better performance." But those statements really should be explicit. This also came up offhandedly in the StringBuilder PR, but I think the PR review there didn't identify any places where the uninitialized data could ever leak to the caller, even in the face of race conditions.

stephentoub · 2021-01-19T21:03:11Z

It's trivial today with a type like List<T> for erroneous code to see values that were never there, e.g.

using System;
using System.Collections.Generic;
using System.Threading.Tasks;

class Program
{
    static void Main()
    {
        var list = new List<(long, long)>();
        list.Add((1, 1));

        Task.Run(() =>
        {
            while (true)
            {
                list[0] = (1, 1);
            }
        });

        Task.Run(() =>
        {
            while (true)
            {
                list[0] = (2, 2);
            }
        });

        while (true)
        {
            var item = list[0];
            if (item.Item1 != item.Item2)
            {
                Console.WriteLine(item);
            }
        }
    }
}

Every single item stored into the list has the exact same value in each item, but because of tearing the reader can see unequal values. From a "This change makes it possible to see values that are complete garbage" perspective, how is this substantially different? Yes, you could see arbitrary bytes that were never written, but from the perspective "could the state here be valid instance of the value type", it seems like it's the same impact.

jkotas · 2021-01-19T21:10:20Z

From a "This change makes it possible to see values that are complete garbage" perspective, how is this substantially different?

For example, you can potentially use a race condition like this for information disclosure attack after this change.

En3Tho · 2021-01-19T21:13:24Z

Why would jit produce all that code for structs instead of a GC allocate call like with reference types? I might be wrong but it tries to get size from runtime handle first? Isn't it known at jit time?

stephentoub · 2021-01-19T21:27:03Z

information disclosure

So the concern is that some code put sensitive data into an array from the pool (whether directly or indirectly via some component it was using), didn't clear it when returning (which is feasible since not clearing is the default and most code doesn't clear), that array is consumed into a collection with a bug that manifest as a race condition that could enable a slot that wasn't written on that instance to be read, and an attacker could manipulate the app into hitting that race condition and surfacing that data.

Ok.

EDIT: My mind went to ArrayPool as it was cited earlier and I forgot we were talking about AllocateUninitializedArray, but obviously it's even easier as GC.AllocateUninitializedArray doesn't require explicit pooling.

tannergooding · 2021-01-19T21:34:59Z

Benchmarking new T[] vs GC.AllocateUninitializedArray<T> shows the below. Essentially, provided the number of bytes being allocated is under 2048 bytes, it's the "same" speed (effectively within 0.5ns) because AllocateUninitializedArray decides to just call new T[] in this scenario.
Above 2048 bytes (so 512 elements for int, 128 elements for Vector128<T>, etc) you start seeing minimal gains. Once it hits 4096 bytes, the allocations become almost twice as fast.

Part of this overhead is due to additional logic introduced by inlining/remove the logic around: AllocateUninitializedArray. Part of the overhead is also related to #5973

Int32

Method	Size	Mean	Error	StdDev
InitializedArray	0	2.268 ns	0.0949 ns	0.1165 ns
UninitializedArray	0	2.394 ns	0.0281 ns	0.0235 ns
InitializedArray	1	2.434 ns	0.0856 ns	0.0715 ns
UninitializedArray	1	2.661 ns	0.0512 ns	0.0454 ns
InitializedArray	2	2.290 ns	0.0362 ns	0.0321 ns
UninitializedArray	2	2.673 ns	0.0236 ns	0.0209 ns
InitializedArray	4	2.501 ns	0.0372 ns	0.0330 ns
UninitializedArray	4	2.919 ns	0.0447 ns	0.0373 ns
InitializedArray	8	3.020 ns	0.0300 ns	0.0266 ns
UninitializedArray	8	3.468 ns	0.1154 ns	0.1185 ns
InitializedArray	16	3.815 ns	0.0381 ns	0.0357 ns
UninitializedArray	16	4.099 ns	0.0489 ns	0.0382 ns
InitializedArray	32	5.555 ns	0.0499 ns	0.0467 ns
UninitializedArray	32	5.835 ns	0.0886 ns	0.0786 ns
InitializedArray	64	8.493 ns	0.0625 ns	0.0488 ns
UninitializedArray	64	8.633 ns	0.0693 ns	0.0614 ns
InitializedArray	128	14.906 ns	0.3316 ns	0.3257 ns
UninitializedArray	128	15.115 ns	0.3600 ns	0.4920 ns
InitializedArray	256	27.904 ns	0.5737 ns	0.4791 ns
UninitializedArray	256	28.460 ns	0.5196 ns	0.4339 ns
InitializedArray	384	39.940 ns	0.4010 ns	0.3760 ns
UninitializedArray	384	40.470 ns	0.4060 ns	0.3800 ns
InitializedArray	512	54.298 ns	1.1052 ns	1.2728 ns
UninitializedArray	512	48.011 ns	0.3103 ns	0.2903 ns
InitializedArray	1024	106.870 ns	1.9866 ns	1.8583 ns
UninitializedArray	1024	61.105 ns	0.9496 ns	0.8418 ns

String

Method	Size	Mean	Error	StdDev
InitializedArray	0	2.112 ns	0.0221 ns	0.0207 ns
UninitializedArray	0	2.872 ns	0.0837 ns	0.0783 ns
InitializedArray	1	2.313 ns	0.0238 ns	0.0211 ns
UninitializedArray	1	3.040 ns	0.0464 ns	0.0434 ns
InitializedArray	2	3.111 ns	0.0349 ns	0.0310 ns
UninitializedArray	2	3.198 ns	0.0262 ns	0.0205 ns
InitializedArray	4	3.244 ns	0.0572 ns	0.0507 ns
UninitializedArray	4	5.894 ns	0.1662 ns	0.4320 ns
InitializedArray	8	4.159 ns	0.1304 ns	0.1281 ns
UninitializedArray	8	5.083 ns	0.1007 ns	0.0942 ns
InitializedArray	16	5.527 ns	0.0780 ns	0.0730 ns
UninitializedArray	16	7.092 ns	0.1682 ns	0.1573 ns
InitializedArray	32	9.007 ns	0.2214 ns	0.2175 ns
UninitializedArray	32	9.496 ns	0.2325 ns	0.2585 ns
InitializedArray	64	15.900 ns	0.2476 ns	0.2316 ns
UninitializedArray	64	16.595 ns	0.1774 ns	0.1660 ns
InitializedArray	128	28.697 ns	0.4198 ns	0.3926 ns
UninitializedArray	128	28.969 ns	0.3815 ns	0.3569 ns
InitializedArray	384	81.260 ns	1.2850 ns	1.1400 ns
UninitializedArray	384	83.200 ns	0.5460 ns	0.4260 ns
InitializedArray	256	56.482 ns	1.0334 ns	1.4821 ns
UninitializedArray	256	58.792 ns	0.5612 ns	0.5249 ns
InitializedArray	512	113.618 ns	1.7741 ns	2.3068 ns
UninitializedArray	512	109.537 ns	0.8621 ns	0.6730 ns

Vector128

Method	Size	Mean	Error	StdDev
InitializedArray	0	2.916 ns	0.0667 ns	0.0557 ns
UninitializedArray	0	2.647 ns	0.0556 ns	0.0493 ns
InitializedArray	1	2.923 ns	0.0392 ns	0.0367 ns
UninitializedArray	1	3.734 ns	0.0626 ns	0.0555 ns
InitializedArray	2	3.514 ns	0.0365 ns	0.0342 ns
UninitializedArray	2	3.728 ns	0.0767 ns	0.0717 ns
InitializedArray	4	4.584 ns	0.0983 ns	0.0872 ns
UninitializedArray	4	4.669 ns	0.0963 ns	0.0900 ns
InitializedArray	8	6.826 ns	0.1845 ns	0.5411 ns
UninitializedArray	8	7.205 ns	0.1767 ns	0.1566 ns
InitializedArray	16	10.198 ns	0.2494 ns	0.5579 ns
UninitializedArray	16	10.371 ns	0.1555 ns	0.1379 ns
InitializedArray	32	17.732 ns	0.2089 ns	0.1852 ns
UninitializedArray	32	17.850 ns	0.2649 ns	0.2477 ns
InitializedArray	64	33.419 ns	0.4831 ns	0.4519 ns
UninitializedArray	64	34.592 ns	0.6800 ns	0.6361 ns
InitializedArray	128	64.925 ns	1.3368 ns	1.3129 ns
UninitializedArray	128	56.595 ns	0.5739 ns	0.5368 ns
InitializedArray	256	131.832 ns	1.9061 ns	1.6897 ns
UninitializedArray	256	77.677 ns	1.1879 ns	1.0530 ns
InitializedArray	384	198.300 ns	3.8900 ns	4.6200 ns
UninitializedArray	384	95.340 ns	0.5700 ns	0.4450 ns
InitializedArray	512	246.528 ns	4.1630 ns	3.6904 ns
UninitializedArray	512	115.572 ns	1.0041 ns	0.8901 ns
InitializedArray	1024	428.161 ns	6.6520 ns	6.2223 ns
UninitializedArray	1024	186.133 ns	2.3016 ns	2.1529 ns

tannergooding · 2021-01-19T22:28:03Z

Will close this for now and open a discussion thread instead until we can come to a decision, one way or the other.

tannergooding · 2021-01-19T22:33:58Z

Logged #47198

tannergooding added 8 commits January 19, 2021 09:51

Changing List<T> to internally use GC.AllocateUninitializedArray<T>

bb43d0b

Changing HashSet<T> to internally use GC.AllocateUninitializedArray<T>

a0fa530

Changing Dictionary<TKey, TValue> to internally use GC.AllocateUninit…

b8dda90

…ializedArray<T>

Changing LinkedList<T> to internally use GC.AllocateUninitializedArra…

c385779

…y<T>

Changing Queue<T> to internally use GC.AllocateUninitializedArray<T>

eb29dd9

Changing SortedList<T> to internally use GC.AllocateUninitializedArra…

214d7bd

…y<T>

Changing SortedSet<T> to internally use GC.AllocateUninitializedArray<T>

bae26f1

Changing Stack<T> to internally use GC.AllocateUninitializedArray<T>

833c045

dotnet-issue-labeler bot added the area-System.Collections label Jan 19, 2021

This was referenced Jan 19, 2021

BinderTracingTest.ResolutionFlow.cmd timed out #37132

Closed

ClientAndServer_OneOrBothUseDefault_Ok - fails with NullReferenceException #2353

Closed

tannergooding closed this Jan 19, 2021

ghost locked as resolved and limited conversation to collaborators Feb 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update the generic collection types to internally use GC.AllocateUninitializedArray<T> #47186

Update the generic collection types to internally use GC.AllocateUninitializedArray<T> #47186

tannergooding commented Jan 19, 2021 •

edited

Loading

ghost commented Jan 19, 2021

tannergooding commented Jan 19, 2021

GrabYourPitchforks commented Jan 19, 2021 •

edited

Loading

tannergooding commented Jan 19, 2021 •

edited

Loading

tannergooding commented Jan 19, 2021

jkotas commented Jan 19, 2021

tannergooding commented Jan 19, 2021

jkotas commented Jan 19, 2021

GrabYourPitchforks commented Jan 19, 2021

stephentoub commented Jan 19, 2021 •

edited

Loading

jkotas commented Jan 19, 2021 •

edited

Loading

En3Tho commented Jan 19, 2021 •

edited

Loading

stephentoub commented Jan 19, 2021 •

edited

Loading

tannergooding commented Jan 19, 2021

tannergooding commented Jan 19, 2021

tannergooding commented Jan 19, 2021

Update the generic collection types to internally use GC.AllocateUninitializedArray<T> #47186

Update the generic collection types to internally use GC.AllocateUninitializedArray<T> #47186

Conversation

tannergooding commented Jan 19, 2021 • edited Loading

ghost commented Jan 19, 2021

tannergooding commented Jan 19, 2021

GrabYourPitchforks commented Jan 19, 2021 • edited Loading

tannergooding commented Jan 19, 2021 • edited Loading

tannergooding commented Jan 19, 2021

jkotas commented Jan 19, 2021

tannergooding commented Jan 19, 2021

jkotas commented Jan 19, 2021

GrabYourPitchforks commented Jan 19, 2021

stephentoub commented Jan 19, 2021 • edited Loading

jkotas commented Jan 19, 2021 • edited Loading

En3Tho commented Jan 19, 2021 • edited Loading

stephentoub commented Jan 19, 2021 • edited Loading

tannergooding commented Jan 19, 2021

Int32

String

Vector128

tannergooding commented Jan 19, 2021

tannergooding commented Jan 19, 2021

tannergooding commented Jan 19, 2021 •

edited

Loading

GrabYourPitchforks commented Jan 19, 2021 •

edited

Loading

tannergooding commented Jan 19, 2021 •

edited

Loading

stephentoub commented Jan 19, 2021 •

edited

Loading

jkotas commented Jan 19, 2021 •

edited

Loading

En3Tho commented Jan 19, 2021 •

edited

Loading

stephentoub commented Jan 19, 2021 •

edited

Loading