This file contains experiences and desiderata for a Go implementation of generics, i.e. polymorphic types and functions.
It is a feature present in many other languages with varying names. A few examples:
The author has experience using generics in the three languages listed above, which will also be used for comparison and reference in the rest of this document.
In addition, the author personally added generics to three programming languages:
- Go: the unofficial interpreter gomacro contains a Go implementation of generics, modeled after C++ templates.
- Common Lisp: the library cl-parametric-types contains a Common Lisp implementation of generics, again modeled after C++ templates.
- the lfyre programming language, created by the author, used to contain an implementation of generics. It now has a different maintainer.
Things the author does not want from Go generics
-
a compile-time sub-language:
Go generics should be an extension of idiomatic Go, not a whole sub-language to be used for compile time operations.
For example, we should avoid compile-time Turing completeness and "expression templates", two accidental features of C++ templates that together created a sub-language of C++ made of template specializations and recursive templates.
Such sub-language also provides arbitrary computation at compile-time (possibly a good thing) with a terrible syntax and no alternative with cleaner syntax.
The much more recent C++constexpr
actually provides the desired alternative, clean syntax for compile-time Turing completeness, but it is more limited: it can only manipulate values, not types.
The reasons to implement generics in Go can be many, and sometimes contradicting. The author's personal list of reasons, which can also be interpreted as goals that Go generics are expected to achieve, are:
-
reusable, flexible algorithms and types. Examples:
a single
sort#[T]
function that can sort any slice of any ordered type.
a singlecache#[K,V]
type that can cache key/value pairs of any type, provided that keys can be compared.
a singlesortedmap#[K,V]
type, similar to the existingmap[K]V
but keeps its entries sorted, like C++map<K,V>
or JavaTreeMap<K,V>
-
type-safety:
generic functions and types should be instantiable on arbitrary, concrete types - for example
sort#[int]
would only accept[]int
slices andcache#[uint64, []byte]
would only acceptuint64
keys and[]byte
values. In particular, generic functions and types should not need to useinterface{}
, either internally or in they exported API, and should not need type assertions at runtime. -
high runtime speed, low runtime overhead:
generic functions and types should be reified in order to maximize code execution speed and have low or zero data representation overhead in memory.
Reified means that
sort#[int]
andsort#[uint]
will be two different and unrelated functions, one only dealing withint
slices and the other only dealing withuint
slices, and thatcache#[uint64, []byte]
andcache#[uint32, []byte]
will be two different and unrelated types, with (possibly) different layout in memory.While reified generics also have disadvantages (see for example https://gbracha.blogspot.com/2018/10/reified-generics-search-for-cure.html) the author has extensive experience with both reified generics (C++, Haskell) and non reified generics (Java), and he is convinced that reified generics are a better fit for Go - the reasons can be explained if needed.
One obvious disadvantage of reified generics is that each instantiation of a generic function must be compiled separately, for example
sort#[int]
andsort#[uint]
, increasing build time.Luckily, Go
import
s compiled packages instead of#include
-ing their source code, which is expected to contain build time for two reasons:-
each generic function will be parsed only once. Instead C++
#include
mechanism typically needs to parse again the same generic function each time it is included by a different source file. -
each instantiation of a generic function - say
sort#[int]
- will be compiled only once, provided that Go implements a cache of instantiated functions and types, similarly to how it implements a cache of compiled packages.
Instead C++#include
mechanism typically needs to compile again the same generic function - saysort<int>
even if it's instantiated with the same types from two different source files - for examplea.cpp
andb.cpp
both usesort<int>
. C++ compilers typically delegates to the linker the job of coalescing multiple, identical versions of the same generic function.
-
-
reasonable build time:
it is expected to be achieved / achievable even with reified generics, see the previous item
-
type inference:
Go extensively uses (and encourages to use) type inference instead of explicitly declaring the type of a variable.
Example:a := foo()
rather thanvar a int = foo()
.When an expression returns multiple values, Go actively pushes the programmer to use type inference. Example:
n, err := fmt.Println("foo")
becomes more verbose without type inference, because each
var
declaration can only reference one type:var n int var err error n, err = fmt.Println("foo")
The goal for generics is to preserve and extend support for type inference, for example by allowing the syntax
slice := make([]int, n) sort(slice)
and automatically inferring that it means
slice := make([]int, n) sort#[int](slice)
-
contracts:
when writing a generic function or type, it should be possible to specify contracts on their type arguments. This is an extensively discussed topic, for many reasons:
-
contracts are expected to simplify compiler error messages, and make them more understandable. For example, a
sort#[T]
function would specify that values ofT
must be ordered - the following syntax is just for illustration purposes:func sort#[T: Ordered](slice []T) { // ... }
Then, attempting to sort a non-ordered type as for example
func ()
could produce an error message likesort: type func() is not Ordered
instead of some deeply nested error due toa < b
used onfunc()
values. -
contracts allow programmers writing generic code to specify explicitly the requirements of their code, i.e. on which types it can be used and why.
Without them, it is not always simple to understand if a complicated generic function or type written by someone else can be used with a certain concrete type
T
, and what are the requirements on suchT
:
the author of generic code could document the requirements, for example in a comment, but he/she may forget it, or the comment could become stale/erroneous if the generic code gets updated.A machine-readable, compiled information is less likely to become stale/erroneous, especially if the compiler actually validates it.
-
if the compiler assumes that contracts specify the only operations supported by the constrained types, it could detect immediately if a constrained type is used improperly in generic code, without having to wait until it gets instantiated (possibly by someone else) on concrete types - for example if methods or arithmetic operations are used on a type that is only constrained as
T: Ordered
For reference, Haskell does exactly that: a contract specifies the only operations allowed on a type.
Actually, Haskell does even more: if a contract for a typeT
is not specified, the compiler infers it from the operations actually performed onT
values (it's not obvious whether such contract inference is appropriate for Go).
It should also be possible to specify multiple contracts on a type. For example, if a type
T
must be bothOrdered
andPrintable
, one could imagine a syntax like:func foo#[T: Ordered, Printable](arg T) { // ... }
-
-
contracts implementation:
An important question is: what should a contract tell about a type?
-
The signature of one or more methods?
-
The signature of one or more functions and/or operators?
-
The name and type of one or more fields?
-
A combination of the above?
It is surely tempting to answer 1. and reuse interfaces as contracts: this would spare us from inventing yet another language construct, but is it enough?
-
Let's check with a relatively simple case: the Ordered
contract.
It describes types that can be ordered, and there's immediately a difficulty:
Go operator <
only works on basic types (integers and floats), and cannot be overloaded
i.e. cannot be extended to support further types.
It will work on types whose underlying type is integer or float, as for example
package time
type Duration int64
but even in such case you cannot define a custom implementation:
operator <
compares time.Duration
values as it would compare int64
.
So let's say that Ordered
will instead use a function Less()
to compare values.
Here we hit another Go (intentional) limitation: function overloading is not supported,
i.e. it's not possible to define multiple functions with the same name and different signatures.
Ok, then let's say that Ordered
will use a method Less()
to compare values.
How do we express that a type must have a method Less()
to be Ordered
?
With an interface, of course:
type Ordered interface {
Less(/*what here?*/) bool
}
We are getting close: we need to express that the single argument of Less()
is the same as the receiver. Go does not support this either, but we are trying to
extend it with generics, and the addition "we can give a name to the receiver type"
feels quite minimal.
What about the following?
type Ordered#[T] interface {
func (T) Less(T) bool
}
It's supposed to mean that Ordered
is a generic interface, i.e. it's polymorphic,
and has a single type argument T
. To satisfy Ordered
, a type must have a method
Less(T)
where T
is also the receiver type (the func (T)
part).
I chose the syntax func (T) Less ...
because that's exactly how we already declare
methods, and the shorter (T) Less ...
did not sound familiar enough.
There are still a couple of issues.
First issue: basic integers and floats do not have any method, so they cannot implement Ordered
.
This can only be solved with a Go language specs change which adds methods to basic types.
On the other hand user-defined types, including standard library ones as time.Duration
,
could add a method Less()
.
Second issue: methods must be declared in the same package as their receiver.
In other words, it's not possible to import a type foo.Bar
and add a method Less()
to it:
either the method is already there because the author forecasted the need, or it's not there
and there's no way to add it (unless you fork the package foo
and modify it -
something that should be a last resort, not the normal case).
This cannot be solved reasonably - but it can become an intentional limitation.
Let's continue our thought experiment on the Ordered
contract.
This time, contracts declare functions on a type, not its methods.
Again, Go operator <
cannot be overloaded, so we use a function Less()
:
type Ordered#[T] contract {
func Less(T, T) bool
}
which means that Ordered
is a generic contract (is it still an interface?
we can try to answer later) and has a single type argument T
.
A concrete type T
satisfies Ordered
if there is a function Less(T,T) bool
.
Since functions cannot be overloaded either, it's immediately evident that
we can only declare one function Less
per package.
That's not what we wanted, and it pushes us toward a much deeper language change:
allow function overloading, i.e. multiple functions with the same name but different signatures.
And once we allow function overloading, why not going the full way and allowing operator overloading too?
The result would be something like:
type Ordered#[T] contract {
operator<(T, T) bool
}
and an hypotetical type Foo
could satisfy Ordered
by declaring a function
operator<(a, b Foo) bool {
// ...
}
A lot of design decisions would have to follow:
In which cases do we allow function overloading and/or operator overloading?
How do we select the function/operator to call when there are multiple candidates
with the same name, differing only in their signature?
And also more mundane questions, as whether we write operator<(a, b Foo) bool { }
or func operator<(a, b Foo) bool { }
.
Although the author really likes Haskell generics, and they happen to go down this exact road, it still feels like a big language change and a hard sell to Go core team and Go community.
This would be likely frowned upon in many object-oriented languages as C++ or Java, where direct access to object's fields is strongly discouraged in favor of setter/getter methods.
Yet Go composite literals are an extremely useful feature, and they rely on initializing exported struct fields to work. Thus maybe it could make sense. Let's see if it's also useful.
One could say that a type T
satisfies the contract Ordered
if T
has a certain field?
It does not seem very useful since fields contain values, they usually do not
"do something" - that's for methods.
Furthermore Go has the peculiar feature that methods can be declared on any named type, not just on structs. But requiring that a type has certain fields makes sense only for structs - quite limiting.
In conclusion it seems to be usable only in some cases, and not useful enough even in those.
The total complexity added to the language would be quite high: the sum of each complexity, plus all the interactions (intentional and accidental) among the proposals.
If option 2. feels like a hard sell, this simply seems too much.
Among the three options analyzed above, the best one appears to be the first:
contracts declare type's methods.
It allows to use generics in many scenarios, yet requires quite limited changes to the language:
- slightly extending
interface
syntax to optionally specify the receiver type - adding methods to basic types - one method per supported operator (actually fewer,
since
Less
can also cover<=
,>
and>=
, whileEqual
can also cover!=
, etc.)
In exchange it allows:
- declaring contracts with a familiar syntax - the same as method declaration and very similar to interface methods
- creating generic algorithms as
sort#[T]
and generic types assortedmap#[K,V]
that work out of the box on both Go basic types and on user-defined types
In option 1, contracts are interfaces, i.e. they declare the methods of a type. With the small extension of allowing to specify also the receiver type, they seem very useful and let programmers create very general generic types and generic algorithms, yet they seem to have very few unintended side effects on the language, and they do not introduce huge language changes.
Are there other downsides we did not consider yet?
Let's analyze more in detail the idea of adding methods on basic types.
To simplify the reasoning, we start with the concrete example sort#[T]
,
which as we said requires a method Less
on T
.
So let's suppose that int
, int8
, int16
, int32
, int64
,
uint
, uint8
, uint16
, uint32
, uint64
, uintptr
, float32
and float64
have such method.
Then a type such as time.Duration
, which is declared as
package time
type Duration int64
will have the method Less
or not?
In Go, there is the rule
-
a named type has the methods of its underlying type i.e. "wrapper methods", plus the methods declared on the named type Following this rule, the question becomes: what's the underlying type of
time.Duration
? -
If the underlying type is
int64
, thentime.Duration
will have a wrapper methodLess
-
If the underlying type is something else (what?) then
time.Duration
will probably not have a wrapper methodLess
.
Now things get subtle. Usually, underlying types are not named types, they are instead unnamed types: channels, maps, slices, arrays, functions, and very often structs.
If int64
was the underlying type of time.Duration
, then these two types
would be assignable to each other, as for example:
import "time"
var i int64 = 7
var d time.Duration = i
Instead, the above does not compile. It turns out that you need an explicit conversion, i.e.
import "time"
var i int64 = 7
var d time.Duration = time.Duration(i)
which is required when two named types have the same underlying type.
Then the underlying type of both int64
and time.Duration
is some unnamed type
that cannot be mentioned directly, and is not expected to have the method Less
.
Thus time.Duration
would not have a method Less
either.
Small issue: if you ask to the package go/types
, the underlying type of
time.Duration
is int64
. This is inconsistent, and I think Go specs explain
the inconsistency as an exception.
If we ignore this small issue, we get the following:
sort#[T]
works onint64
because it declares the methodLess
sort#[T]
does not work ontime.Duration
because it lacks the methodLess
This is clearly annoying and cumbersome.
An alternative is to decide that the underlying type of both int64
and time.Duration
(the unnamed type that cannot be mentioned directly) has the method Less
,
thus both int64
and time.Duration
also have Less
as wrapper method.
The situation becomes:
sort#[T]
works onint64
because it has the wrapper methodLess
sort#[T]
does not work ontime.Duration
because it has the wrapper methodLess
Now this is good, but it has a subtle side effect: what happens if time.Duration
declares its own method Less
for some reason?
Such method Less
shadows (hides) the wrapper method, and sort#[T]
will happily
use it for sorting, provided it has the expected signature.
Thus we have a way to declare a custom ordering criterion for a type.
In essence, time.Duration
can define its own ordering by declaring a method
Less
- to be precise, a method func (time.Duration) Less(time.Duration) bool
.
This is very similar to what C++ achieves using operator overloading:
if a C++ type has the operator<
, such operator will be used by std::sort()
as default comparison operator.
So we have replaced operator overloading with a different but equivalent mechanism:
"declare a method with a certain name and signature".
The name Less
becomes special, because sort#[T]
looks for it.
But the function or method name operator<
actualy looks special (it's not alphanumeric),
while the name Less
does not look very special (it's alphanumeric).
Thus time.Duration
may declare a method Less
for its own purposes,
without realizing that sort#[T]
will try to use it.
Worse, time.Duration
or some other similar type may already declare a method Less
,
and once we introduce generics, sort#[T]
would use the method Less
to compare values,
instead of comparing the underlying type (int64
and friends).
This is an unwanted effect, and quite insidious too: an existing, innocent looking
method Less
suddenly acquires a special meaning, and causes existing code
(the various sorting algorithms in package sort
) to silently change their behaviour.
TO BE CONTINUED
There are many possible ways to implement generics - one could say too many - and they can be extremely different in usability, expressiveness, implementation complexity, compile-time performance and run-time performance.
TO BE CONTINUED