-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update the generic collection types to internally use GC.AllocateUninitializedArray<T> #47186
Conversation
Tagging subscribers to this area: @eiriktsarpalis Issue DetailsAs per the title, this updates the generic collection types to internally use GC.AllocateUninitializedArray. When Most of the collection types are already optimized to only zero removed elements if For a simple test, such as System.Collections.CtorGivenSize.List which creates a
For string, it is likewise showing essentially no difference:
I haven't run the full suite of performance tests, but similar gains should be seen for effectively any value type for the given collections.
|
CC. @stephentoub, @jkotas |
This may potentially turn race conditions into memory safety violations. Consider: multiple threads are improperly attempting to access a With the new implementation, since the underlying array is now backed by potentially uninitialized memory, a racing thread may improperly read an uninitialized T from the underlying array. This is a possible memory safety violation because it may result in the disclosure of an arbitrary memory address's contents. |
It can because all of the types, modulo if (RuntimeHelpers.IsReferenceOrContainsReferences<T>())
{
_items[_size] = default!;
} if (RuntimeHelpers.IsReferenceOrContainsReferences<T>())
{
Array.Clear(_items, freeIndex, _size - freeIndex); // Clear the elements so that the gc can reclaim the references.
} etc So we already have situations today where parts of the collection are uninitialized. This just makes it so that all elements start that way. |
Ah, nevermind. I misinterpreted "uninitialized" here. Is that actually a concern? Any user could define their own collection type using |
This micro-benchmark is not representative of actual user scenario. Creating an empty list and doing nothing with it is not interesting. What is the improvement for the case you actually try to do something with the list? Is this going to regress cases where you are creating small lists, e.g. list with just a few elements that are the 99% scenario for collections according to the telemetry? |
I'll check.... At least for when mov rcx, 0xD1FFAB1E
mov edx, <size>
call CORINFO_HELP_NEWARR_1_OBJ For when mov rcx, 0xD1FFAB1E
call CORINFO_HELP_TYPEHANDLE_TO_RUNTIMETYPE
mov gword ptr [rsp+20H], rax
lea rcx, bword ptr [rsp+20H]
call System.RuntimeTypeHandle:get_Value():long:this
mov rcx, rax
mov edx, <size>
mov r8d, 16
call System.GC:AllocateNewArray(long,int,int):System.Array Likewise, if optimizations aren't enabled it is ultimately a call to |
Today, all parts of the collections are either zero-initialized or initialized to a valid value. This change makes it possible to see values that are complete garbage. I agree with @GrabYourPitchforks that this weakens security defense-in-depth in the presence of race conditions. |
There are some cases where we're ok with these implementation details possibly being exposed to external callers. |
It's trivial today with a type like using System;
using System.Collections.Generic;
using System.Threading.Tasks;
class Program
{
static void Main()
{
var list = new List<(long, long)>();
list.Add((1, 1));
Task.Run(() =>
{
while (true)
{
list[0] = (1, 1);
}
});
Task.Run(() =>
{
while (true)
{
list[0] = (2, 2);
}
});
while (true)
{
var item = list[0];
if (item.Item1 != item.Item2)
{
Console.WriteLine(item);
}
}
}
} Every single item stored into the list has the exact same value in each item, but because of tearing the reader can see unequal values. From a "This change makes it possible to see values that are complete garbage" perspective, how is this substantially different? Yes, you could see arbitrary bytes that were never written, but from the perspective "could the state here be valid instance of the value type", it seems like it's the same impact. |
For example, you can potentially use a race condition like this for information disclosure attack after this change. |
Why would jit produce all that code for structs instead of a GC allocate call like with reference types? I might be wrong but it tries to get size from runtime handle first? Isn't it known at jit time? |
So the concern is that some code put sensitive data into an array from the pool (whether directly or indirectly via some component it was using), didn't clear it when returning (which is feasible since not clearing is the default and most code doesn't clear), that array is consumed into a collection with a bug that manifest as a race condition that could enable a slot that wasn't written on that instance to be read, and an attacker could manipulate the app into hitting that race condition and surfacing that data. Ok. EDIT: My mind went to ArrayPool as it was cited earlier and I forgot we were talking about AllocateUninitializedArray, but obviously it's even easier as GC.AllocateUninitializedArray doesn't require explicit pooling. |
Benchmarking Part of this overhead is due to additional logic introduced by inlining/remove the logic around: Int32
String
Vector128
|
Will close this for now and open a discussion thread instead until we can come to a decision, one way or the other. |
Logged #47198 |
As per the title, this updates the generic collection types to internally use GC.AllocateUninitializedArray.
When
T
is a reference type, this should not differ from the standardnew T[]
as the API ensures the array is zeroed anyways for correctness.However, when
T
is a value type and particularly when the size ofT
or the collection is large, this can help avoid needless zeroing in the underlying APIs.Most of the collection types are already optimized to only zero removed elements if
T
is a reference type and internally track and check their bounds so this should be completely safe.For a simple test, such as System.Collections.CtorGivenSize.List which creates a
List
with a capacity of 512, this shows some small gains:For string, it is likewise showing essentially no difference:
I haven't run the full suite of performance tests, but similar gains should be seen for effectively any value type for the given collections.