Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal]: First-Class Span Types #7905

Open
1 of 4 tasks
333fred opened this issue Feb 3, 2024 · 30 comments
Open
1 of 4 tasks

[Proposal]: First-Class Span Types #7905

333fred opened this issue Feb 3, 2024 · 30 comments

Comments

@333fred
Copy link
Member

333fred commented Feb 3, 2024

First-Class Span Types

  • Proposed
  • Prototype: Not Started
  • Implementation: test plan
  • Specification: link

Summary

We introduce first-class support for Span<T> and ReadOnlySpan<T> in the language, including new implicit conversion types and consider them in more places,
allowing more natural programming with these integral types.

Motivation

Since their introduction in C# 7.2, Span<T> and ReadOnlySpan<T> have worked their way into the language and base class library (BCL) in many key ways. This is great for
developers, as their introduction improves performance without costing developer safety. However, the language has held these types at arm's length in a few key ways,
which makes it hard to express the intent of APIs and leads to a significant amount of surface area duplication for new APIs. For example, the BCL has added a number of new
tensor primitive APIs in .NET 9, but these APIs are all offered on ReadOnlySpan<T>. Because C# doesn't recognize the
relationship between ReadOnlySpan<T>, Span<T>, and T[], it means that any developers looking to use those APIs with anything other than a ReadOnlySpan<T> have to explicitly
convert to a ReadOnlySpan<T>. Further, it also means that they don't have IDE tooling guiding them to use these APIs, since nothing will indicate to the IDE that it is valid
to pass them after conversion. There are also issues with generic inference in these scenarios. In order to provide maximum usability for this style of API, the BCL will have to
define an entire set of Span<T> and T[] overloads, which is a lot of duplicate surface area to maintain for no real gain. This proposal seeks to address the problem by
having the language more directly recognize these types and conversions.

Detailed Design

https://github.com/dotnet/csharplang/blob/main/proposals/first-class-span-types.md

Design meetings

@TahirAhmadov
Copy link

I'm probably missing something obvious, but why doesn't just adding implicit operators suffice?

@333fred
Copy link
Member Author

333fred commented Feb 5, 2024

Implicit operators already exist. They do not suffice because they do not participate in the pieces that I outlined in the proposal.

@TahirAhmadov
Copy link

Oh I see what you mean, it's because of generics.
Isn't there an opportunity to do something more global here? Say, look at the target type; it's generic; look at implicit operators and try to solve "the generic equation" - is it possible that the expression type satisfies the implicit operator's parameter type with any type argument; if solved, determine the generic argument; use the implicit operator.

@333fred
Copy link
Member Author

333fred commented Feb 20, 2024

These changes require a deep understanding of the relationship between the array type and a type parameter; I don't really see a generalizable mechanism that wouldn't be massively overcomplicated for the class of problem that we're trying to fix here.

@KalleOlaviNiemitalo
Copy link

Would you allow an extension method invocation to combine an implicit span conversion with an identity conversion?

using System;

class C {
    void M() {
        object[] objs = {};
        S.X(objs); // OK
        objs.X(); // ?

        (int a, int b)[] tuples = {};
        S.Y(tuples); // OK
        tuples.Y(); // ?
    }
}

static class S {
    internal static void X(this Span<dynamic> span) {}
    internal static void Y(this Span<(int c, int d)> span) {}
}

Or alternatively, allow this identity conversion as part of the implicit span conversion:

  • From any single-dimensional array_type with element type Ei to System.Span<EiUi>, provided that there is an identity conversion (§10.2.2) from Ei to Ui

@KalleOlaviNiemitalo
Copy link

The conversion to Span<T> throws in the covariant case

using System;

class C {
    static void Main() {
        object[] objs = new string[1];
        S.X(objs); // implicit conversion to Span<object> throws ArrayTypeMismatchException
        objs.X();  // implicit span conversion presumably likewise
    }
}

static class S {
    internal static void X(this Span<object> span) {}
}

The proposal defines the implicit span conversion to be a standard implicit conversion. It's then also a pre-defined conversion according to §10.4.1:

The standard conversions are those pre-defined conversions that can occur as part of a user-defined conversion.

This paragraph in §10.2.1 then has to be changed:

The pre-defined implicit conversions always succeed and never cause exceptions to be thrown.

@333fred
Copy link
Member Author

333fred commented Mar 8, 2024

Would you allow an extension method invocation to combine an implicit span conversion with an identity conversion?

No, that is not intended by this proposal.

@KalleOlaviNiemitalo
Copy link

Does that mean:

using System;

class C {
    void M() {
        (int a, int b)[]? array1 = null;
        (int c, int d)[]? array2 = null;
        array1.N(); // calls X.N(array1)
        array2.N(); // calls Y.N(array2)
        array1.AsSpan().N(); // error CS0121, ambiguous
    }
}

static class X {
    public static void N(this Span<(int a, int b)> span) {}
}

static class Y {
    public static void N(this Span<(int c, int d)> span) {}
}

I think this would be the first time that tuple element names affect which method is called.

@Neme12
Copy link

Neme12 commented May 1, 2024

I'm firmly against the idea that the language would pick a span overload over an array or string overload and I think it would be really unfortunate. There are issues with that, as I mentioned here:

I think that would be unfortunate. There are many reasons where a method takes a span as well as an array/string, and the span overload potentially has to allocate, whether for legacy reasons (e.g. Stream, where the default implementation has to potentially allocate and make a copy and forward to the original array-taking one - and I would guess the majority of real-world stream do not override the span one from what I've seen in other people's code), and I've seen it in cases other than legacy reasons as well where a new API was added with overloads for both either string/array and a span and the span-taking one had to allocate. I think it was for String implementing ISpanParsable - the regular parsable doesn't allocate, the span-taking one does. But it was still added for it to be usable in generic scenarios where you only have a span and not a string. But if you do have a string, you should still prefer the string-taking one. After all, string/arrays are more derived and more usable, and can be reused. Spans cannot be reused, and so sometimes you have to allocate an array/string from them.

The only case where preferring span overloads adds value is for params, where the array version allocates invisibly by default (but I might be missing other cases as well in the language where there are implicit allocations similar to this that would be mitigated by span). Of course, If C# didn't have invisible allocations from the start, even this wouldn't be a problem.

An array is a strictly derived type from a span. It has more featueres, it is more usable, and it can be put on the heap. If all you get is a span, you cannot put it on the heap or reuse it as a field, and you might have to allocate and copy the data to get something more usable. String, on the other hand, can be reused. If what you already have is an array or a string, you should call those corresponding overloads. Preferring span is only useful in cases like params span when you don't already have an array you could pass in, and it will implicitly allocate one. But that's an issue with the language implicitly allocating.

I'd be fine with the language understanding span types, but only if it still considered them base types of arrays or strings - not the other way around. It would be nice to avoid the duplication of overloads on MemoryExtensions for both Span<T> and ReadOnlySpan<T> that's there for basically every method.

@CyrusNajmabadi
Copy link
Member

I'm firmly against the idea that the language would pick a span overload over an array or string overload

We've already decided, based on empirical need, that we will pick the span overload.

There are issues with that, as I mentioned

These issues do not apply in aggregate. And it would be bad for the ecosystem as a whole if we punished the majority of cases for a tiny subset of cases where there might be a problem. Those tiny subset should be designed to not have an issue here, versus requiring the rest of the ecosystem to try to get the perf that is sensible by default.

@CyrusNajmabadi
Copy link
Member

An array is a strictly derived type from a span. It has more featueres, it is more usable, and it can be put on the heap

That's a negative. it cannot be put on the stack. The span approach means you can operate on both uniformly. And, in the normal common case, you get the stack benefits across the ecosystem. It is strictly the type that should be preferred. And that is borne out by the teams looking at this and specifically wnating both the most flexibility and the best perf.

@Neme12
Copy link

Neme12 commented May 1, 2024

But why would picking Span<T> when a method has an array overload as well be better in any case at all?

@TahirAhmadov
Copy link

@Neme12 the method knows how it's going to use the parameter, so it (in the overwhelming majority of cases) knows if it needs to allocate from a span (thus negating the performance gains of spans in the first place or even making it worse in some cases). Why don't we use the overload resolution attribute to solve this? Basically, if the span overload knows it's better, it has an attribute with a higher value, and if not, the array/string one will have a higher value.

@CyrusNajmabadi
Copy link
Member

But why would picking Span when a method has an array overload as well be better in any case at all?

So that the caller doesn't need to heap allocate. Which will often be the case the majority of the time.

@TahirAhmadov
Copy link

So that the caller doesn't need to heap allocate. Which will often be the case the majority of the time.

To clarify, I think she is asking why we would call the span overload if you already have an array.

@Neme12
Copy link

Neme12 commented May 1, 2024

So that the caller doesn't need to heap allocate. Which will often be the case the majority of the time.

But as I said, it only allocates implicitly for params. Otherwise, when you do already have an array and a method has overloads for both array and span, it's completely fine to call the array one.

@CyrusNajmabadi
Copy link
Member

you cannot put it on the heap

The vast vast vast majority of cases do not do this. And they would not do this either if they got an array (as the data could change out from underneath them). So a copy needs to happen in the array case as well here. So spans are no worse.

@CyrusNajmabadi
Copy link
Member

Otherwise, when you do already have an array and a method has overloads for both array and span, it's completely fine to call the array one.

Which is exactly what the lang/compiler would do as that would be an identity conversion.

@TahirAhmadov
Copy link

Basically, the span overload would only be preferred if it's a collection expression and we target type it to span?

@Neme12
Copy link

Neme12 commented May 1, 2024

So why would picking a Span<T> overload be better than an existing array overload in any case other than params where the language otherwise implicitly allocates an array?

@CyrusNajmabadi
Copy link
Member

Basically, the span overload would only be preferred if it's a collection expression

No. It would also be preferred for things like params.

@TahirAhmadov
Copy link

No. It would also be preferred for things like params.

Right, I didn't mention it because @Neme12 already said it.

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented May 1, 2024

Ok. So, as i've been saying. We prefer span because it is better when a non-identity conversion is involved. If you have an array, and it's an identity conversion, then identity conversion rules continue to apply. :)

@Neme12
Copy link

Neme12 commented May 1, 2024

Oh sorry, I misunderstood the proposal then. I should have read all of the detailed design before commenting.

@CyrusNajmabadi
Copy link
Member

i thnk you guys may be confusing what an identity conversion is versus the 'implicit span conversion' that fred is talking about adding. That's a implicit widening conversion. not an identity conversion. An identity conversion si still first priority in the conversion list.

@CyrusNajmabadi
Copy link
Member

no worries :)

@Neme12
Copy link

Neme12 commented May 1, 2024

I agree it definitely makes sense to prefer Span in cases like collection expressions, where if it called the array one instead, it would have to allocate an array/list/whatever the natural type ends up being.

@Neme12
Copy link

Neme12 commented May 1, 2024

However, the language has held these types at arm's length in a few key ways,
which makes it hard to express the intent of APIs and leads to a significant amount of surface area duplication for new APIs. For example, the BCL has added a number of new
tensor primitive APIs in .NET 9, but these APIs are all offered on ReadOnlySpan<T>. Because C# doesn't recognize the
relationship between ReadOnlySpan<T>, Span<T>, and T[], it means that any developers looking to use those APIs with anything other than a ReadOnlySpan<T> have to explicitly
convert to a ReadOnlySpan<T>.

I'm still not understanding this part of the motivation though. When I look at the linked API proposal, I don't see any duplication. And why would a developer have to convert to ReadOnlySpan<T> explicitly when they have something like an array, and an implicit conversion exists? Or were these supposed to be extension methods originally?

@333fred
Copy link
Member Author

333fred commented May 1, 2024

When I look at the linked API proposal, I don't see any duplication.

Because they're counting on this proposal to come in and not force them to duplicate 🙂.

And why would a developer have to convert to ReadOnlySpan explicitly when they have something like an array, and an implicit conversion exists?

Because the implicit conversion is a user-defined conversion, which doesn't carry through in a number of places, including type inference for these highly-generic cases.

@Neme12
Copy link

Neme12 commented May 1, 2024

Ohh, I see what the issue is now (the T cannot be inferred), thanks.

My concerns were about making spans to be considered more specific than e.g. arrays with respect to conversions, as opposed to the opposite, like today. Although now than I'm thinking about it, if a type has implicit conversions to both arrays and spans, the span ones could indeed be better as they should be non-allocating (at least that's the assumption for implicit conversions to spans), and could return a span directly over a struct field.

But I'd still definitely want to avoid a scenario where given an existing array and method overloads for both arrays and spans, the span one is picked anyway even though I have an exact type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants