Parallel Categories trace type for multi dimensional categorical data #1

jonmmease · 2018-07-27T19:13:15Z

@alexcjohnson @etpinard @monfera @chriddyp @jackparmer

Introduction

This PR is a proposal and an implementation of a new trace type for the interactive exploration of multi-dimensional categorical data sets. My working name for the trace is "Parallel Categories" or parcats for short.

The concept of this trace has been discussed previously in the following plotly.js issues:

I also briefly showed a prototype of this diagram to @chriddyp over screenshare several months ago.

Related work

The closest prior art to the Parallel Categories Diagram is the Parallel Sets Diagram by Robert Kosara and Caroline Ziemkiewicz.

Parallel Sets implementations / descriptions

Here are a collection of existing implementations / descriptions of the Parallel Sets Diagram

Parallel Sets Java Program

https://eagereyes.org/parallel-sets

This is a stand-alone Java program by Kosara that implements a Parallel Sets Diagram

Parallel Sets from the DataViz catalog

https://datavizcatalogue.com/methods/parallel_sets.html

D3 implementation of Parallel Sets

https://www.jasondavies.com/parallel-sets/

What's different about the Parallel Categories Diagram?

The primary difference between this Parallel Categories Diagram (parcats from here on) and the Parallel Sets Diagram (parsets from each on) is that the parcats diagram supports a more flexible path coloring scheme.

In all of the examples of parsets diagrams that I have found, the colors of the paths correspond to states in the left-most (or top-most) dimension. In contrast, for the parcats diagram, color may correspond to a column in the dataset that may or may not be present as a dimensions in the diagram.

This, admittedly modest, extension has several advantages. Path colors may be set using a numeric array and a color map just like many other plotly.js trace types (scatter, parcoords, etc.). This makes it possible to use the parcats diagram combined with other traces in brushing/crossfiltering configurations.

Dragging and Brushing example

Here is an example of visualizing a 5-dimensional data set with two continuous dimensions and 3 categorical dimensions. This is accomplished by displaying the two continuous dimensions in a 2D scatter plot and the 3 categorical dimensions using the parcats diagram.

I created this example using a branch of plotly.py version 3 built against this branch of plotly.js.

First I show the drag interactions supported by the diagram. Categories (the rectangles) and dimensions labels (dimensions are the columns of rectangles) can be dragged to reorder categories and dimensions. Upon release, the diagram animates to a relaxed state with equal spacing between dimensions and categories.

Selection events in the scatter plot are use to update the colors of both the selected points in the scatter plot, and the corresponding paths in the parcats diagram. Similarly, click events on categories and paths in the parcats diagram are used to update the colors in both diagrams.

As far as I'm aware, this is the only visualization of multi-dimensional categorical data that supports this kind of two-way data brushing. And, combined with plotly.py version 3, it is certainly the only visualization of this type that would be easily accessible to Python users.

Color bundling

There are two modes for how the colors of paths are arranged.

In the example above, color is not considered when sorting the paths. This is desirable in a brushing scenario so that the paths remain stable as the colors change during interactions. This behavior is specified by setting the bundlecolors property to false.

Setting the bundlecolors property to true causes paths with like colors to be bundled together as they pass through each category. This results in a cleaner looking diagram and is preferable in cases where the positions of paths do not need to remain stable as colors change.

For example:

Mocks

Several simple mocks have been added as a part of the current test suite.

parcats_basic

parcats_bundled

parcats_unbundled

API notes

I tried to model the API as closely as possible after existing trace conventions. There is a top-level dimensions property with label and values sub-properties just as with the parcats trace. Path colors/colorscales are specified under a dimension.marker parent property.

Alternative approach

In the issues cited at the beginning of this PR there was some discussion on the possibility of adding categorical support to the existing Parallel Coordinates Diagram. This diagram was already well under development for our internal needs at the time of these discussions, so I did not pursue this approach.

TODO

Some items that I know still need to be done

Font styling support
Complete attribute descriptions
Complete the test suite. In terms of my personal testing standards I'd estimate that the test suite is about 50% complete.
Examples!

Request for comments

So the top-level question for the plotly.js team is, are you all interested in having this diagram be part of plotly.js? It's not the most common use-case, but I think it would be another differentiating feature for the plotly ecosystem.

If you all are interested, I have internal funding to put a bunch more time into this through September. And if we can get it merged in during that time, I can continue helping out with basic maintenance after that.

Let me know what you think!

chriddyp · 2018-07-27T19:35:27Z

(Adding @nicolaskruchten as well!)

etpinard · 2018-07-27T19:57:57Z

Thanks very much for the PR @jonmmease 🎉 🚀 🏆

Unfortunately, i'm a little backed up in my own TODO list, so I probably won't have time to look at your work until v1.40.0 is out in ~ 2 weeks time.

nicolaskruchten · 2018-07-27T20:21:37Z

Omg! So cool :) ... more professional feedback on Monday ;)

jonmmease · 2018-07-27T21:54:17Z

Thanks @etpinard and @nicolaskruchten , I know reviewing a new trace type is no small task!

alexcjohnson · 2018-07-31T17:06:16Z

Very nice work @jonmmease! Looking at what you've got here, I'm happy to start this way (as a separate trace type). At some point we still may want to fold this into parcoords, as there would be significant value in being able to combine categorical and continuous dimensions side-by-side, but parcoords code is not so accessible and I wouldn't want that to constrain the features we can include here. Presumably in the meantime users can use the two trace types side-by-side with coupled selection, to get the same result with just a bit more manual plumbing.

I'll look at the code in a moment, but a few comments on the display and behavior:

Dragging: I think it'd be preferable to keep vertical and horizontal reordering distinct: dragging a category only moves it vertically, and only dragging the dimension label will reorder dimensions.
Sorting: we may need more than 3 dimensions to hit all cases, but in the existing mocks I see some clear opportunities to reduce crossings. This isn't a blocker but would be nice to take a look at what's possible. In parcats_bundled, B->11 and C->11 both have two blue paths that could stay adjacent, and the red 2->A->11 could be at the top of 2 and could be the highest red in 11. I guess these all fall into a class of optimizations based on pairs of adjacent dimensions, and perhaps a separate sort order per dimension? It also means the ordering could change as you reorder dimensions, and reorder categories within a dimension, which seems to happen just a little bit right now, but mostly these operations don't affect sorting.

alexcjohnson · 2018-07-31T17:10:14Z

src/components/fx/hover.js

@@ -153,6 +153,82 @@ exports.loneHover = function loneHover(hoverItem, opts) {
    return hoverLabel.node();
 };

+// TODO: replace loneHover?


Interesting... can you say more? Is this just an extension of loneHover to support multiple labels or does it add something else? I only see one label at a time when I play with parcats.

At one point I had added a mode that would display a separate tooltip for each color of the category you hover over. It got pretty unwieldy and I reverted back to loneHover. I'll remove this customHover function.

Unwieldy in the code or on screen? We have some upcoming sankey enhancements that may call for a very similar multi-label hover effect, though the details still need to be worked out. But if the code is clean and it was just not looking good in practice, it still may be worth keeping this around.

On the screen. I think the implementation worked well enough. I'll look over it again and write a better explanation of what it does 🙂

alexcjohnson · 2018-07-31T17:16:08Z