Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the generic collection types to internally use GC.AllocateUninitializedArray<T> #47186

Closed
wants to merge 8 commits into from

Conversation

tannergooding
Copy link
Member

@tannergooding tannergooding commented Jan 19, 2021

As per the title, this updates the generic collection types to internally use GC.AllocateUninitializedArray.

When T is a reference type, this should not differ from the standard new T[] as the API ensures the array is zeroed anyways for correctness.
However, when T is a value type and particularly when the size of T or the collection is large, this can help avoid needless zeroing in the underlying APIs.

Most of the collection types are already optimized to only zero removed elements if T is a reference type and internally track and check their bounds so this should be completely safe.

For a simple test, such as System.Collections.CtorGivenSize.List which creates a List with a capacity of 512, this shows some small gains:

BenchmarkDotNet=v0.12.1.1466-nightly, OS=Windows 10.0.19042
AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK=6.0.100-alpha.1.21056.1
  [Host]     : .NET 6.0.0 (6.0.21.5406), X64 RyuJIT
  Job-RSPOMU : .NET 6.0.0 (42.42.42.42424), X64 RyuJIT
  Job-ECGLUA : .NET 6.0.0 (42.42.42.42424), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  Arguments=/p:DebugType=portable  IterationTime=250.0000 ms
MaxIterationCount=20  MinIterationCount=15  WarmupCount=1
Method Job Toolchain Size Mean Error StdDev Median Min Max Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
List Job-RSPOMU \runtime\artifacts\bin\testhost\net6.0-windows-Release-x64\shared\Microsoft.NETCore.App\6.0.0\CoreRun.exe 512 65.10 ns 0.783 ns 0.694 ns 65.12 ns 64.24 ns 66.48 ns 1.37 0.04 0.1256 0.0008 - 2 KB
List Job-ECGLUA \runtime_base\artifacts\bin\testhost\net6.0-windows-Release-x64\shared\Microsoft.NETCore.App\6.0.0\CoreRun.exe 512 47.48 ns 1.022 ns 1.136 ns 47.29 ns 46.09 ns 50.31 ns 1.00 0.00 0.1256 0.0019 - 2 KB

For string, it is likewise showing essentially no difference:

Method Job Toolchain Size Mean Error StdDev Median Min Max Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
List Job-RSPOMU \runtime\artifacts\bin\testhost\net6.0-windows-Release-x64\shared\Microsoft.NETCore.App\6.0.0\CoreRun.exe 512 98.22 ns 2.805 ns 2.881 ns 98.63 ns 93.72 ns 103.37 ns 1.04 0.04 0.2480 0.0072 - 4 KB
List Job-ECGLUA \runtime_base\artifacts\bin\testhost\net6.0-windows-Release-x64\shared\Microsoft.NETCore.App\6.0.0\CoreRun.exe 512 94.07 ns 1.643 ns 1.537 ns 94.72 ns 91.21 ns 96.11 ns 1.00 0.00 0.2481 0.0074 - 4 KB

I haven't run the full suite of performance tests, but similar gains should be seen for effectively any value type for the given collections.

@ghost
Copy link

ghost commented Jan 19, 2021

Tagging subscribers to this area: @eiriktsarpalis
See info in area-owners.md if you want to be subscribed.

Issue Details

As per the title, this updates the generic collection types to internally use GC.AllocateUninitializedArray.

When T is a reference type, this should not differ from the standard new T[] as the API ensures the array is zeroed anyways for correctness.
However, when T is a value type and particularly when the size of T or the collection is large, this can help avoid needless zeroing in the underlying APIs.

Most of the collection types are already optimized to only zero removed elements if T is a reference type and internally track and check their bounds so this should be completely safe.

For a simple test, such as System.Collections.CtorGivenSize.List which creates a List with a capacity of 512, this shows some small gains:

BenchmarkDotNet=v0.12.1.1466-nightly, OS=Windows 10.0.19042
AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK=6.0.100-alpha.1.21056.1
  [Host]     : .NET 6.0.0 (6.0.21.5406), X64 RyuJIT
  Job-RSPOMU : .NET 6.0.0 (42.42.42.42424), X64 RyuJIT
  Job-ECGLUA : .NET 6.0.0 (42.42.42.42424), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  Arguments=/p:DebugType=portable  IterationTime=250.0000 ms
MaxIterationCount=20  MinIterationCount=15  WarmupCount=1
Method Job Toolchain Size Mean Error StdDev Median Min Max Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
List Job-RSPOMU \runtime\artifacts\bin\testhost\net6.0-windows-Release-x64\shared\Microsoft.NETCore.App\6.0.0\CoreRun.exe 512 65.10 ns 0.783 ns 0.694 ns 65.12 ns 64.24 ns 66.48 ns 1.37 0.04 0.1256 0.0008 - 2 KB
List Job-ECGLUA \runtime_base\artifacts\bin\testhost\net6.0-windows-Release-x64\shared\Microsoft.NETCore.App\6.0.0\CoreRun.exe 512 47.48 ns 1.022 ns 1.136 ns 47.29 ns 46.09 ns 50.31 ns 1.00 0.00 0.1256 0.0019 - 2 KB

For string, it is likewise showing essentially no difference:

Method Job Toolchain Size Mean Error StdDev Median Min Max Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
List Job-RSPOMU \runtime\artifacts\bin\testhost\net6.0-windows-Release-x64\shared\Microsoft.NETCore.App\6.0.0\CoreRun.exe 512 98.22 ns 2.805 ns 2.881 ns 98.63 ns 93.72 ns 103.37 ns 1.04 0.04 0.2480 0.0072 - 4 KB
List Job-ECGLUA \runtime_base\artifacts\bin\testhost\net6.0-windows-Release-x64\shared\Microsoft.NETCore.App\6.0.0\CoreRun.exe 512 97.07 ns 1.643 ns 1.537 ns 94.72 ns 91.21 ns 96.11 ns 1.00 0.00 0.2481 0.0074 - 4 KB

I haven't run the full suite of performance tests, but similar gains should be seen for effectively any value type for the given collections.

Author: tannergooding
Assignees: -
Labels:

area-System.Collections

Milestone: -

@tannergooding
Copy link
Member Author

CC. @stephentoub, @jkotas

@GrabYourPitchforks
Copy link
Member

GrabYourPitchforks commented Jan 19, 2021

This may potentially turn race conditions into memory safety violations. Consider: multiple threads are improperly attempting to access a List<T>. With today's implementation, it's possible that they can corrupt the internal state of the List<T> and stomp on each others' data. This could result in some strange behaviors, such as a thread improperly reading default(T) or a stale T from the list. But it cannot possibly result in a thread reading an uninitialized T from the list.

With the new implementation, since the underlying array is now backed by potentially uninitialized memory, a racing thread may improperly read an uninitialized T from the underlying array. This is a possible memory safety violation because it may result in the disclosure of an arbitrary memory address's contents.

@tannergooding
Copy link
Member Author

tannergooding commented Jan 19, 2021

But it cannot possibly result in a thread reading an uninitialized T from the list.

It can because all of the types, modulo SortedSet, have code such as the following:

if (RuntimeHelpers.IsReferenceOrContainsReferences<T>())
{
    _items[_size] = default!;
}
if (RuntimeHelpers.IsReferenceOrContainsReferences<T>())
{
    Array.Clear(_items, freeIndex, _size - freeIndex); // Clear the elements so that the gc can reclaim the references.
}

etc

So we already have situations today where parts of the collection are uninitialized. This just makes it so that all elements start that way.

@tannergooding
Copy link
Member Author

Ah, nevermind. I misinterpreted "uninitialized" here.

Is that actually a concern? Any user could define their own collection type using GC.AllocUninitializedArray, etc and could have the "same" side effect and the same view of memory.
Likewise with reading "stale" data from an existing collection type.

@jkotas
Copy link
Member

jkotas commented Jan 19, 2021

System.Collections.CtorGivenSize.List

This micro-benchmark is not representative of actual user scenario. Creating an empty list and doing nothing with it is not interesting.

What is the improvement for the case you actually try to do something with the list?

Is this going to regress cases where you are creating small lists, e.g. list with just a few elements that are the 99% scenario for collections according to the telemetry?

@tannergooding
Copy link
Member Author

Is this going to regress cases where you are creating small lists, e.g. list with just a few elements that are the 99% scenario for collections according to the telemetry?

I'll check....

At least for when T is a reference type it is optimized to be just new T[] and the JIT emits the following as expected (same codegen as new T[]):

mov rcx, 0xD1FFAB1E
mov edx, <size>
call CORINFO_HELP_NEWARR_1_OBJ

For when T is a value type, it looks like it generates less efficient code:

 mov      rcx, 0xD1FFAB1E
 call     CORINFO_HELP_TYPEHANDLE_TO_RUNTIMETYPE
 mov      gword ptr [rsp+20H], rax
 lea      rcx, bword ptr [rsp+20H]
 call     System.RuntimeTypeHandle:get_Value():long:this
 mov      rcx, rax
 mov      edx, <size>
 mov      r8d, 16
 call     System.GC:AllocateNewArray(long,int,int):System.Array

Likewise, if optimizations aren't enabled it is ultimately a call to AllocateUninitializedArray. However, I think both of these would be addressable by making the method an Intrinsic.
Given the gains on big collections of value types, particularly int, aren't uncommon, this seems like something worth pursuing.

@jkotas
Copy link
Member

jkotas commented Jan 19, 2021

we already have situations today where parts of the collection are uninitialized.

Today, all parts of the collections are either zero-initialized or initialized to a valid value. This change makes it possible to see values that are complete garbage.

I agree with @GrabYourPitchforks that this weakens security defense-in-depth in the presence of race conditions.

@GrabYourPitchforks
Copy link
Member

GC.AllocateUninitializedArray is an unsafe-equivalent API. When it's used as an internal implementation detail it's normally fine. If a user chooses to call this in their own code, it's on them to ensure safe usage. But if there's any chance that this implementation detail might leak to an external caller (such as the caller triggering a race condition), we need to proceed with caution.

There are some cases where we're ok with these implementation details possibly being exposed to external callers. ArrayPool<T>.Shared.Rent is the most prominent example. And it might be ok here as well if we're willing to say "you know what, it's only a bad thing if the caller already has a race condition, and we're willing to sacrifice some memory safety there in order to eke out better performance." But those statements really should be explicit. This also came up offhandedly in the StringBuilder PR, but I think the PR review there didn't identify any places where the uninitialized data could ever leak to the caller, even in the face of race conditions.

@stephentoub
Copy link
Member

stephentoub commented Jan 19, 2021

It's trivial today with a type like List<T> for erroneous code to see values that were never there, e.g.

using System;
using System.Collections.Generic;
using System.Threading.Tasks;

class Program
{
    static void Main()
    {
        var list = new List<(long, long)>();
        list.Add((1, 1));

        Task.Run(() =>
        {
            while (true)
            {
                list[0] = (1, 1);
            }
        });

        Task.Run(() =>
        {
            while (true)
            {
                list[0] = (2, 2);
            }
        });

        while (true)
        {
            var item = list[0];
            if (item.Item1 != item.Item2)
            {
                Console.WriteLine(item);
            }
        }
    }
}

Every single item stored into the list has the exact same value in each item, but because of tearing the reader can see unequal values. From a "This change makes it possible to see values that are complete garbage" perspective, how is this substantially different? Yes, you could see arbitrary bytes that were never written, but from the perspective "could the state here be valid instance of the value type", it seems like it's the same impact.

@jkotas
Copy link
Member

jkotas commented Jan 19, 2021

From a "This change makes it possible to see values that are complete garbage" perspective, how is this substantially different?

For example, you can potentially use a race condition like this for information disclosure attack after this change.

@En3Tho
Copy link
Contributor

En3Tho commented Jan 19, 2021

Why would jit produce all that code for structs instead of a GC allocate call like with reference types? I might be wrong but it tries to get size from runtime handle first? Isn't it known at jit time?

@stephentoub
Copy link
Member

stephentoub commented Jan 19, 2021

information disclosure

So the concern is that some code put sensitive data into an array from the pool (whether directly or indirectly via some component it was using), didn't clear it when returning (which is feasible since not clearing is the default and most code doesn't clear), that array is consumed into a collection with a bug that manifest as a race condition that could enable a slot that wasn't written on that instance to be read, and an attacker could manipulate the app into hitting that race condition and surfacing that data.

Ok.

EDIT: My mind went to ArrayPool as it was cited earlier and I forgot we were talking about AllocateUninitializedArray, but obviously it's even easier as GC.AllocateUninitializedArray doesn't require explicit pooling.

@tannergooding
Copy link
Member Author

Benchmarking new T[] vs GC.AllocateUninitializedArray<T> shows the below. Essentially, provided the number of bytes being allocated is under 2048 bytes, it's the "same" speed (effectively within 0.5ns) because AllocateUninitializedArray decides to just call new T[] in this scenario.
Above 2048 bytes (so 512 elements for int, 128 elements for Vector128<T>, etc) you start seeing minimal gains. Once it hits 4096 bytes, the allocations become almost twice as fast.

Part of this overhead is due to additional logic introduced by inlining/remove the logic around: AllocateUninitializedArray. Part of the overhead is also related to #5973

Int32

Method Size Mean Error StdDev
InitializedArray 0 2.268 ns 0.0949 ns 0.1165 ns
UninitializedArray 0 2.394 ns 0.0281 ns 0.0235 ns
InitializedArray 1 2.434 ns 0.0856 ns 0.0715 ns
UninitializedArray 1 2.661 ns 0.0512 ns 0.0454 ns
InitializedArray 2 2.290 ns 0.0362 ns 0.0321 ns
UninitializedArray 2 2.673 ns 0.0236 ns 0.0209 ns
InitializedArray 4 2.501 ns 0.0372 ns 0.0330 ns
UninitializedArray 4 2.919 ns 0.0447 ns 0.0373 ns
InitializedArray 8 3.020 ns 0.0300 ns 0.0266 ns
UninitializedArray 8 3.468 ns 0.1154 ns 0.1185 ns
InitializedArray 16 3.815 ns 0.0381 ns 0.0357 ns
UninitializedArray 16 4.099 ns 0.0489 ns 0.0382 ns
InitializedArray 32 5.555 ns 0.0499 ns 0.0467 ns
UninitializedArray 32 5.835 ns 0.0886 ns 0.0786 ns
InitializedArray 64 8.493 ns 0.0625 ns 0.0488 ns
UninitializedArray 64 8.633 ns 0.0693 ns 0.0614 ns
InitializedArray 128 14.906 ns 0.3316 ns 0.3257 ns
UninitializedArray 128 15.115 ns 0.3600 ns 0.4920 ns
InitializedArray 256 27.904 ns 0.5737 ns 0.4791 ns
UninitializedArray 256 28.460 ns 0.5196 ns 0.4339 ns
InitializedArray 384 39.940 ns 0.4010 ns 0.3760 ns
UninitializedArray 384 40.470 ns 0.4060 ns 0.3800 ns
InitializedArray 512 54.298 ns 1.1052 ns 1.2728 ns
UninitializedArray 512 48.011 ns 0.3103 ns 0.2903 ns
InitializedArray 1024 106.870 ns 1.9866 ns 1.8583 ns
UninitializedArray 1024 61.105 ns 0.9496 ns 0.8418 ns

String

Method Size Mean Error StdDev
InitializedArray 0 2.112 ns 0.0221 ns 0.0207 ns
UninitializedArray 0 2.872 ns 0.0837 ns 0.0783 ns
InitializedArray 1 2.313 ns 0.0238 ns 0.0211 ns
UninitializedArray 1 3.040 ns 0.0464 ns 0.0434 ns
InitializedArray 2 3.111 ns 0.0349 ns 0.0310 ns
UninitializedArray 2 3.198 ns 0.0262 ns 0.0205 ns
InitializedArray 4 3.244 ns 0.0572 ns 0.0507 ns
UninitializedArray 4 5.894 ns 0.1662 ns 0.4320 ns
InitializedArray 8 4.159 ns 0.1304 ns 0.1281 ns
UninitializedArray 8 5.083 ns 0.1007 ns 0.0942 ns
InitializedArray 16 5.527 ns 0.0780 ns 0.0730 ns
UninitializedArray 16 7.092 ns 0.1682 ns 0.1573 ns
InitializedArray 32 9.007 ns 0.2214 ns 0.2175 ns
UninitializedArray 32 9.496 ns 0.2325 ns 0.2585 ns
InitializedArray 64 15.900 ns 0.2476 ns 0.2316 ns
UninitializedArray 64 16.595 ns 0.1774 ns 0.1660 ns
InitializedArray 128 28.697 ns 0.4198 ns 0.3926 ns
UninitializedArray 128 28.969 ns 0.3815 ns 0.3569 ns
InitializedArray 384 81.260 ns 1.2850 ns 1.1400 ns
UninitializedArray 384 83.200 ns 0.5460 ns 0.4260 ns
InitializedArray 256 56.482 ns 1.0334 ns 1.4821 ns
UninitializedArray 256 58.792 ns 0.5612 ns 0.5249 ns
InitializedArray 512 113.618 ns 1.7741 ns 2.3068 ns
UninitializedArray 512 109.537 ns 0.8621 ns 0.6730 ns

Vector128

Method Size Mean Error StdDev
InitializedArray 0 2.916 ns 0.0667 ns 0.0557 ns
UninitializedArray 0 2.647 ns 0.0556 ns 0.0493 ns
InitializedArray 1 2.923 ns 0.0392 ns 0.0367 ns
UninitializedArray 1 3.734 ns 0.0626 ns 0.0555 ns
InitializedArray 2 3.514 ns 0.0365 ns 0.0342 ns
UninitializedArray 2 3.728 ns 0.0767 ns 0.0717 ns
InitializedArray 4 4.584 ns 0.0983 ns 0.0872 ns
UninitializedArray 4 4.669 ns 0.0963 ns 0.0900 ns
InitializedArray 8 6.826 ns 0.1845 ns 0.5411 ns
UninitializedArray 8 7.205 ns 0.1767 ns 0.1566 ns
InitializedArray 16 10.198 ns 0.2494 ns 0.5579 ns
UninitializedArray 16 10.371 ns 0.1555 ns 0.1379 ns
InitializedArray 32 17.732 ns 0.2089 ns 0.1852 ns
UninitializedArray 32 17.850 ns 0.2649 ns 0.2477 ns
InitializedArray 64 33.419 ns 0.4831 ns 0.4519 ns
UninitializedArray 64 34.592 ns 0.6800 ns 0.6361 ns
InitializedArray 128 64.925 ns 1.3368 ns 1.3129 ns
UninitializedArray 128 56.595 ns 0.5739 ns 0.5368 ns
InitializedArray 256 131.832 ns 1.9061 ns 1.6897 ns
UninitializedArray 256 77.677 ns 1.1879 ns 1.0530 ns
InitializedArray 384 198.300 ns 3.8900 ns 4.6200 ns
UninitializedArray 384 95.340 ns 0.5700 ns 0.4450 ns
InitializedArray 512 246.528 ns 4.1630 ns 3.6904 ns
UninitializedArray 512 115.572 ns 1.0041 ns 0.8901 ns
InitializedArray 1024 428.161 ns 6.6520 ns 6.2223 ns
UninitializedArray 1024 186.133 ns 2.3016 ns 2.1529 ns

@tannergooding
Copy link
Member Author

Will close this for now and open a discussion thread instead until we can come to a decision, one way or the other.

@tannergooding
Copy link
Member Author

Logged #47198

@ghost ghost locked as resolved and limited conversation to collaborators Feb 18, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants