Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add blogpost: Tribulations of CanBuildFrom #651

Merged
merged 5 commits into from
May 30, 2017
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
193 changes: 193 additions & 0 deletions blog/_posts/2017-05-29-tribulations-canbuildfrom.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,193 @@
---
layout: blog
post-type: blog
by: Julien Richard-Foy
title: Tribulations of CanBuildFrom
---

[`CanBuildFrom`](/api/2.12.2/scala/collection/generic/CanBuildFrom.html) is probably the most
infamous abstraction of the current collections. It is mainly criticised for making scary type
signatures.

Our ongoing [collections redesign](https://github.com/scala/collection-strawman) is an opportunity
to try alternative designs. This blogposts explains the (many!) problems solved by `CanBuildFrom`
and the alternative solutions implemented in the new collections.

## Transforming the elements of a collection

It’s useful to think of `String` as a collection of `Char` elements: you can then use
the common collection operations like `++`, `find`, etc. on `String` values.

However the `map` method is challenging because this one
transforms the `Char` elements into something that might or might not be `Char`s.
Then, what should be the return type of the `map` method on `String` values? Ideally,
we want to get back a `String` if we transform each `Char` into another `Char`, but we
want to get some `Seq[B]` if we transform each `Char` into a different type `B`. And this
is the way it currently works:

~~~
Welcome to Scala 2.12.2 (OpenJDK 64-Bit Server VM, Java 1.8.0_131).
Type in expressions for evaluation. Or try :help.

scala> "foo".map(c => c.toInt)
res1: scala.collection.immutable.IndexedSeq[Int] = Vector(102, 111, 111)

scala> "foo".map(c => c.toUpper)
res2: String = FOO
~~~

This feature is not limited to the `map` method: `flatMap`, `collect`, `concat` and a few
others also work the same. Moreover, `String` is not the only
collection type that needs this feature: [`BitSet`](/api/2.12.2/index.html?search=bitset)
and [`Map`](/api/2.12.2/index.html?search=map) are other examples.

The current collections rely on `CanBuildFrom` to implement this feature. The `map`
method is defined as follows:

~~~ scala
def map[B, That](f: Char => B)(implicit bf: CanBuildFrom[String, B, That]): That
~~~

When the implicit `CanBuildFrom` parameter is resolved it fixes the return type `That`.
The resolution is driven by the actual `B` type: if `B` is `Char` then `That` is fixed
to `String`, otherwise it is `immutable.IndexedSeq`.

The drawback of this solution is that the type signature of the `map` method looks cryptic.

In the new design we solve this problem by defining two overloads of the `map`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to point out that this new design required improvements to type inference which are only available in 2.12.

method: one that handles `Char` to `Char` transformations, and one that handles other
transformations. The type signatures of these `map` methods are straightforward:

~~~ scala
def map(f: Char => Char): String
def map[B](f: Char => B): Seq[B]
~~~

Then, if you call `map` with a function that returns a `Char`, the first overload is
selected and you get a `String`. Otherwise, the second overload is selected and you
get a `Seq[B]`.

Thus, we got rid of the cryptic method signatures while still supporting the feature
of returning a different type of result according to the type of the transformation function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but we lose the ability to abstract over arbitrary map implementations in a unified way

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this point is not very important from the point of view of users. That’s more related to our internal implementation.


## Collections’ type constructors with different arities

The collections are hierarchically organized. Essentially, the most generic collection
is `Iterable[A]`, and then we have three main kinds of collections: `Seq[A]`, `Set[A]`
and `Map[K, V]`.

![](/resources/img/blog/collections-hierarchy.svg)

It is worth noting that `Map[K, V]` takes two type parameters (`K` and `V`) whereas the
other collection types take only one type parameter. This makes it difficult to
generically define, at the level of `Iterable[A]`, operations that will
return a `Map[K, V]` when specialized.

For instance, consider again the case of the `map` method. We want to generically define
it on `Iterable[A]`, but which return type should we use? When this method will
be inherited by `List[A]` we want its return type to be `List[B]`, but when
it will be inherited by `HashMap[K, V]`, we want its return type to be `HashMap[L, W]`.
It is clear that we want to abstract over the type constructor of the concrete collections,
but the difficulty is that they don’t always take the same number of type parameters.

That’s a second problem solved by `CanBuildFrom` in the current collections.
Look again at the type signature of the (generic) `map` method on `Iterable[A]`:

~~~ scala
def map[B, That](f: A => B)(implicit bf: CanBuildFrom[Repr, B, That]): That
~~~

The return type `That` is inferred from the resolved `CanBuildFrom` instance at call-site.
Both the `Repr` and `B` types actually drive the implicit resolution: when `Repr` is `List[_]`
the parameter `That` is fixed to `List[B]`, and when `Repr` is `HashMap[_, _]` and `B` is a
tuple `(K, V)` then `That` is fixed to `HashMap[K, V]`.

In the new design we solve this problem by defining two “branches” in the hierarchy:

- `IterableOps` for collections whose type constructor takes one parameter,
- `MapOps` for collections whose type constructor takes two parameters.

Here is a simplified version of `IterableOps`:

~~~ scala
trait IterableOps[A, CC[_]] {
def map[B](f: A => B): CC[B]
}
~~~

The `CC` type parameter stands for *C*ollection type *C*onstructor. Then, the `List[A]`
concrete collection extends `IterableOps[A, List]` to set its correct self-type constructor.

Similarly, here is a simplified version of `MapOps`:

~~~ scala
trait MapOps[K, V, CC[_, _]] extends IterableOps[(K, V), Iterable] {
def map[L, W](f: ((K, V)) => (L, W)): CC[L, W]
}
~~~

And then the `HashMap[K, V]` concrete collection extends `MapOps[K, V, HashMap]` to set
its correct self-type constructor. Note that `MapOps` extends `IterableOps`: consequently it
inherits from its `map` method, which will be selected when the transformation function
passed to `map` does not return a tuple.

## Sorted collections

The third challenge is about sorted collections (like `TreeSet` and `TreeMap`, for instance).
These collections define their order of iteration according to an ordering relationship for the
type of their elements.

As a consequence, when you transform the type of the elements (e.g. by using the -- now familiar! --
`map` method), an implicit ordering instance for the new type of elements has to be available.

With `CanBuildFrom`, the solution relies (again) on the implicit resolution mechanism:
the implicit `CanBuildFrom[TreeSet[_], X, TreeSet[X]]` instance is available for some
type `X` only if an implicit `Ordering[X]` instance is also available.

In the new design we solve this problem by introducing a new branch in the hierarchy.
This one defines transformation operations that require an ordering instance for the element
type of the resulting collection:

~~~ scala
trait SortedIterableOps[A, CC[_]] {
def map[B : Ordering](f: A => B): CC[B]
}
~~~

However, as mentioned in the previous section, we need to also abstract over the kind of the
type constructor of the concrete collections. Consequently we have in total four branches:

kind | not sorted | sorted
------------|-------------|-------------------
`CC[_]` |`IterableOps`|`SortedIterableOps`
`CC[_, _]` |`MapOps` |`SortedMapOps`

In summary, instead of having one `map` method that supports all the use cases described in
this section and the previous ones, we specialized the hierarchy to have overloads of
the `map` method, each one supporting a specific use case. The benefit is that the type
signatures immediately tell you the story: you don’t have to have a look at the actual
implicit resolution to know the result you will get from calling `map`.

## Implicit builders

In the current collections, the fact that `CanBuildFrom` instances are available in the
implicit scope is useful to implement, separately from the collections, generic operations
that work with any collection type.

Examples of use cases are:

- [`Future.traverse`](https://github.com/scala/scala/blob/92ffe04070f25452b8d48ee7fbced587ddafbf6d/src/library/scala/concurrent/Future.scala#L822-L840)
- type-driven builders (e.g. in [play-json](https://github.com/playframework/play-json/blob/8642c485c79e32263b7bef5f991abb486523b3ef/play-json/shared/src/main/scala/Reads.scala#L144-L170), or [slick](https://github.com/slick/slick/blob/51e14f2756ed29b8c92a24b0ae24f2acd0b85c6f/slick/src/main/scala/slick/jdbc/PositionedResult.scala#L150-L154))
- extension methods (e.g. in [scala-extensions](https://github.com/cvogt/scala-extensions/blob/master/src/main/scala/collection.scala#L14-L28))

In the new design we are still experimenting with solutions to support these features. So far
the decision is to not put implicit builders in the collections implementation. We might
provide them as an optional dependency instead, but it seems that most of these use cases
could be supported even without implicit builders: you could just use an existing collection
instance and navigate through its companion object (providing the builder), or you could just
use the companion object directly to get a builder.

## Summary

In this article we have reviewed the features built on top of `CanBuildFrom` and explained
the design decision we made for the new collections to support these features without `CanBuildFrom`.
22 changes: 22 additions & 0 deletions resources/img/blog/collections-hierarchy.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.