-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[API Proposal]: MemoryExtensions.Count / number of occurrences of a value in a span #59466
Comments
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
Ok wow, you linked my blog post there and you just made my day 😄 If it helps, we have a vectorized cc. @tannergooding as mentioned before, I'd be happy to coordinate with the .NET libraries team if we wanted to port (with changes if needed) the |
If we want to add such methods, I'd prefer that:
For your use case, why was Replace insufficient? |
Completely fine by me. (I've started with String because I thought it might be simpler to do than having it generic in MemoryExtensions)
Also fine by me, will definitely look more complete & useful if these are added as well.
I had to replace multiple characters with different replacements (different sizes e.g. some are 1:1 replace, others 1:2), so instead of creating multiple intermediate strings, I opted for calculating the target size first and then copy / replace it in the buffer of string.create. |
My preference is just for span on MemoryExtensions. I don't think this is important enough to also add to String, and having it on String also means more overloads (with spans, you don't need overloads that take offset/count).
Makes sense. This is relatively common. It might be worth thinking through what an API for that on String would look like (which could use MemoryExtensions.Count as an implementation detail if desired). Even in dotnet/runtime that's done in a variety of places, e.g. runtime/src/coreclr/System.Private.CoreLib/src/System/Reflection/Emit/ModuleBuilder.cs Line 664 in 9dc499c
runtime/src/libraries/Microsoft.VisualBasic.Core/src/Microsoft/VisualBasic/Information.vb Line 406 in 9dc499c
Line 35 in 9dc499c
runtime/src/libraries/System.Data.OleDb/src/System/Data/ProviderBase/DbConnectionPoolCounters.cs Line 263 in 9dc499c
runtime/src/libraries/System.Private.CoreLib/src/System/Diagnostics/Tracing/EventSource.cs Line 5307 in 9dc499c
and more generally: https://grep.app/search?q=%5C.Replace%5C%28.%2A%5C.Replace%5C%28®exp=true&filter[lang][0]=C%23 |
Tagging subscribers to this area: @dotnet/area-system-runtime Issue DetailsBackground and motivationFor a performance sensitive code I had to compute a string, for which I had to replace certain characters in a input string. So in order to use To my surprise there is currently no string.Count method that is vectorized or I've missed it... API Proposalnamespace System
{
public sealed partial class String
{
public int Count(char value);
public int Count(char value, int startIndex);
public int Count(char value, int startIndex, int count);
}
} Additional Note:
API Usage// Instead of this
int count = 0;
int tmp = 0;
while ((tmp = stringValue.IndexOf('[', tmp)) >= 0)
{
count++;
tmp++;
}
// Or this
int count = 0;
for(int i =0; i < stringValue.Length; i++)
{
if (stringValue[i] == '[')
{
count++;
}
}
// We could do this in a more optimized and simplified version
int count = stringValue.Count('['); RisksNone that I can think of.
|
I've updated the initial post to a proposal for extending MemoryExtensions.
That certainly would be a handy API to have, but maybe a separate issue for it is better suited to discuss its detail. Said API could then (as you said) use this API once they're actually implemented. |
This is another interesting example of the issue described in #75175, where in the past we may have waved something away as being trivial to implement as a simple loop, but with the performance of vectorization on the table and the lack of an auto-vectorization story, developers would end up needing to write a lot more code to approximate the performance of what we'd presumably include in the box. Here's are three implementations of a using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System;
using System.Linq;
using System.Numerics;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
using System.Runtime.Intrinsics;
[MemoryDiagnoser]
public unsafe partial class Program
{
static void Main(string[] args) => BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly).Run(args);
[Params(8, 1024)]
public int Length { get; set; }
[Params(false, true)]
public bool Fill { get; set; }
private char[] _data;
private const char Target = 'c';
[GlobalSetup]
public void Setup() => _data = Enumerable.Repeat(Fill ? Target : '\0', Length).ToArray();
[Benchmark(Baseline = true)]
public int OneAtATimeLoop()
{
int count = 0;
foreach (char c in _data)
{
if (c == Target)
{
count++;
}
}
return count;
}
[Benchmark]
public int IndexOfLoop()
{
int count = 0;
ReadOnlySpan<char> span = _data;
int pos;
while ((pos = span.IndexOf(Target)) >= 0)
{
span = span.Slice(pos + 1);
count++;
}
return count;
}
[Benchmark]
public int Vectorized() => CountVectorized(_data, Target);
private static int CountVectorized(ReadOnlySpan<char> span, char value)
{
ref ushort current = ref Unsafe.As<char, ushort>(ref MemoryMarshal.GetReference(span));
ref ushort end = ref Unsafe.Add(ref current, span.Length);
int count = 0;
if (Vector128.IsHardwareAccelerated && span.Length >= Vector128<ushort>.Count)
{
if (Vector256.IsHardwareAccelerated && span.Length >= Vector256<ushort>.Count)
{
Vector256<ushort> targetVector = Vector256.Create((ushort)value);
ref ushort oneVectorAwayFromEndMinus1 = ref Unsafe.Subtract(ref end, Vector256<ushort>.Count - 1);
do
{
count += BitOperations.PopCount(Vector256.Equals(Vector256.LoadUnsafe(ref current), targetVector).ExtractMostSignificantBits());
current = ref Unsafe.Add(ref current, Vector256<ushort>.Count);
}
while (Unsafe.IsAddressLessThan(ref current, ref oneVectorAwayFromEndMinus1));
if (Unsafe.IsAddressLessThan(ref current, ref Unsafe.Subtract(ref end, Vector128<ushort>.Count - 1)))
{
count += BitOperations.PopCount(Vector128.Equals(Vector128.LoadUnsafe(ref current), Vector128.Create((ushort)value)).ExtractMostSignificantBits());
current = ref Unsafe.Add(ref current, Vector128<ushort>.Count);
}
}
else
{
Vector128<ushort> targetVector = Vector128.Create((ushort)value);
ref ushort oneVectorAwayFromEndMinus1 = ref Unsafe.Subtract(ref end, Vector128<ushort>.Count - 1);
do
{
count += BitOperations.PopCount(Vector128.Equals(Vector128.LoadUnsafe(ref current), targetVector).ExtractMostSignificantBits());
current = ref Unsafe.Add(ref current, Vector128<ushort>.Count);
}
while (Unsafe.IsAddressLessThan(ref current, ref oneVectorAwayFromEndMinus1));
}
}
while (Unsafe.IsAddressLessThan(ref current, ref end))
{
if (current == value)
{
count++;
}
current = ref Unsafe.Add(ref current, 1);
}
return count;
}
} It's then running those three approaches over two kinds of input: one without any of the target character, and one entirely filled with the target character.
The IndexOfLoop approach that we normally push folks to does a great job when there aren't many of the target. But as the density increases, it gets worse and worse, to the point where if the haystack is entirely full of the target, it's 12x worse than the simple foreach. But the Vector128/256 approach does well in all the cases. |
I would recommend against having Other overloads look 👍 though! Edit: A Twitter micro-thread, for those curious on the vagaries of string algorithmic complexity. |
That's fine for me, I never needed them anyway. One question regarding overloads, do you think it might be worth to special case the count with 2 values? (Either with one return count each or combined) Or is this already too "edge casey" to start with? |
Hmm now that I think of it, that would be rather a CountAny() than Count(). Should probably be its own api review if needed. |
I don't understand the question; can you elaborate with an example? Do you mean |
I went ahead and updated my initial post and removed the StringComparison overloads. @stephentoub Fully agree with your statement. Let's not make this more complex than it needs to be for now. If this API gets approved, and it is not urgently needed by anyone, I wouldn't mind taking this one. (It will take me some time to get startet, as it would be my first issue here) |
Thanks. You're welcome to. I'd already sketched it out: |
namespace System
{
public sealed partial class MemoryExtensions
{
public static int Count<T>(this ReadOnlySpan<T> span, T value) where T : IEquatable<T>?;
public static int Count<T>(this ReadOnlySpan<T> span, ReadOnlySpan<T> value) where T : IEquatable<T>?;
public static int Count<T>(this Span<T> span, T value) where T : IEquatable<T>?;
public static int Count<T>(this Span<T> span, ReadOnlySpan<T> value) where T : IEquatable<T>?;
}
} |
@bollhals, this has been approved now. Would you like to pick it up? |
Happy to. It might take me a few weeks to get it done (vacation, get the env running, getting some more unstanding of the vector implementation,...) I assume this is not time critical. One more question to the api, I saw your comment in some other issue about |
Great. Thanks.
Correct.
Yup. Everything relevant on MemoryExtensions has been updated with |
@bollhals, just wanted to check in. Still planning to work on this? |
Thanks for checking in, I still plan to, but I got surprised by a private relocation, so I wasn't able to work a lot on it. If it is a priority, then please feel free to take over, otherwise would plan to continue on it in the next weeks once I'm done relocating. |
Not a problem or rush; just wanted to check in. |
@stephentoub Would that be a potential place for (not saying we actually need it) int CountOfAny{Except}(this ReadOnlySpan<T> span, IndexOfAnyValues<T> values); ? |
@bollhals, happy new year, just checking in... is this still something you're planning to work on soon?
Yes, any of our searching APIs (IndexOf, Contains, Replace, Count, etc.) could possibly have IndexOfAnyValues-based overloads in the future. It does beg the question of whether we might want to rename the type to SearchValues or something a bit less API specific (though we talked about that in the API review and left it as it was for now). |
Happy new year to you too! One question I had regarding tests, what kind of test do you expect? So far I've mainly copied the ones for ReadOnlySpan.Contains method (so tests with byte, int, TInt (wrapped one) and string for both Count methods (single value and ROS)) |
The ideal would be 100% code coverage, exercising common cases and edge cases, and for at least two T types. |
Background and motivation
For a performance sensitive code I had to compute a string, for which I had to replace certain characters in a input string. So in order to use
string.Create
I had to know the final size upfront, for this I had to count the number of a specific char I wanted to replace (e.g replacing '[' with '<' & ' ' (2 chars), so the final size was old size + number of chars to replace found)To my surprise there is currently no string.Count method that is vectorized or I've missed it...
A simple for loop would've worked as well to count, but as seen here, it is slower than a vectorized count. (The link also shows an example implementation of a vectorized count and its improvements
API Proposal
API Usage
Risks
None that I can think of.
The text was updated successfully, but these errors were encountered: