Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Runtime Dataflow Viewer #1023

Merged
merged 183 commits into from
Oct 23, 2021
Merged

Add Runtime Dataflow Viewer #1023

merged 183 commits into from
Oct 23, 2021

Conversation

saulshanabrook
Copy link
Contributor

@saulshanabrook saulshanabrook commented May 27, 2021

This pull request adds a visual viewer/debugger for the runtime dataflow.

Screen Shot 2021-08-29 at 9 26 35 PM

Motivation

Vega (Lite) provides a wonderful declarative mechanism for specifying visualizations, but by default, it requires all data to be loaded in memory to the client's browsers. It is often impractical or impossible to let Vega handle all of the data transforms and instead we wish to "push down" the queries that Vega visualization needs to some other data system. This could be another system in your browser, like Arquero, or a remote database.

Previously, I have achieved this by transforming the Vega spec itself, extracting out the existing transforms, and replacing them with a combined transform that executes on a remote database (ibis-vega-transform). This approach was able to move some interactive visualizations to SQL, but it was tied to Python. When we wanted to explore a strictly client-side version of this, that could work without a Python kernel, @domoritz from the Vega team recommended that we look into operating at the Vega runtime dataflow level, instead of the vega spec level, in order to more accurately capture all of the data transformations.

In order to move forward with this, I wanted to have a better understanding of how this runtime dataflow operated. I also wanted to be able to debug how each node in it functioned, which is vital to being able to modify the graph and the nodes.

There are also a number of related issues other have opened on debugging Vega (Lite): vega/vega-lite#4134 vega/vega#407 vega/vega#1879

Background

There was an existing article, by @jheer, called "How Vega Works" that showed a visual representation of the dataflow and let you see how different "pulses" of the graph executed it. @chengluyu then took that code and created an interactive vega-inspector to add more information and support more node types.

They helped for smaller graphs, but I knew that I needed to be able to inspect graphs like the "Interactive Layered Crossfilter" example, which became very hard to read and interact with, using those tools.

Features

So in this pull request, I have added a visual runtime dataflow viewer. In this video look at the data pipeline of the Interactive Layered Crossfilter example, by looking at what changes as you interact with the diagram and what nodes are executed:

Untitled.mp4

It currently includes:

  • Zooming, with mouse wheel
  • Selecting a node or edge to filter to the related nodes (ancestors and children)
  • Selecting a pulse to filter to those nodes touched in that pulse
  • Hovering over a node, to see the parameters used to instantiate it
  • If a pulse is selected, on hover the current value of the node will be shown as well
  • Filtering by type of node. We add nodes to the graph for all operators in the graph, as well as the updates, bindings, streams, and data.
  • Background processing of the node layout in a webworker, to allow for continued interaction
  • Caching of layouts to speed up switching to an existing one

Details

To render the graph, I use Cytoscape JS, which is a popular canvas based graph renderer. To layout the nodes, I used the Eclipse Layout Kernel. Originally, I used the Cytoscape ELK layout plugin, but switched to using elkjs directly, in order to have greater control of the layout timing and caching.

This PR depends on a corresponding PR in the Vega main repo to add typing for the runtime: vega/vega#3237, which it uses to properly type the function which turn the runtime into a graph.

The state management is bit complex in this code, unfortunately, primarily due to the need to interface with an async layout engine (ELK) and an imperative view layer (Cytoscape). I have gone through a number of different iterations on how to synchronize all of this properly (component state, React's useReducer, Cytoscape state), and have currently settled on moving as much state to Redux as possible, since the application is already using Redux for the rest of its state.

I tried not to disturb any of the existing application code, but I did create all of the necceary state management code for this viewer in a "feature" subfolder, instead of following the existing pattern in the code base of keeping all reducers in the same file. I did this because this functionality is tightly coupled and splitting it off into its own folder made it much easier to iterate on and add to gradually. I tried to follow Redux best practices and pulled in the Redux toolkit to help implement those.

Future work

I have found this debuger useful to get a better grasp of the vega runtime dataflow, through particular examples, but there are a number of areas for follow up work. Since this PR is already quite large (too large?), I hope that any additional features could be added afterword. A few I have collected are:

  • Add more nuanced time profiling for each pulse, to understand how much time is spent on each node. We could then size the nodes by time.
  • On any action that is about to selection, on hover grey out all the nodes that wouldn't be selected. This would be on nodes themselves, on pulses, and on types.
  • Try filtering out axis and legends to reduce graph size
  • Improve styles of side panel, to make them more consistent and inline with application
  • Only record pulses when dataflow panel is open, to reduce memory consumption and CPU usage normally
  • Move node parameter details and values to side panel from popup
  • Auto select first pulse when loading, to show those values by default
  • When selecting a pulse, also show streams that caused that pulse to run
  • Add details to the graph to show semantics of nested nodes better, by showing what the special parent signal is for and the root node.

TODO

  • Consolidate tooltip libraries, react-tooltip and tippy
  • Move all deps to ^
  • Try removing web-worker dep
  • Try animating node positions
  • Remove tooltip when clicking on node
  • Add unselect button for node selection
  • clarify existing clear button for pulses
  • Fix clear button not causing re-layout when no pulse is selected
  • Fix scrolling on long list of pulses
  • Make element selection darker
  • Set selected nodes on relayout
  • Fix clicking on background to unselect when selected node not visible
  • Improve layout
    • move unconnected nodes out of middle
    • add more padding between nodes
    • Try adding more vertical alignment, possibly by aligning all render nodes

chengluyu and others added 30 commits September 20, 2019 10:33
Now if you hover any node in the scene graph tree, the corresponding element will be highlighted.
@domoritz
Copy link
Member

You can click on a row in the table, and that selects a pulse. I should make it more clear somehow what it's doing! Let me know if you have suggestions.

Oh, I got confused since my chart doesn't have pulses so the table is empty. You should hide the table when it has no rows.

@saulshanabrook
Copy link
Contributor Author

Oh, I got confused since my chart doesn't have pulses so the table is empty. You should hide the table when it has no rows.

Huh, I think it should always have one pulse? The initial one? If you select that one, then at least you can see the initial values when you hover over each node.

@saulshanabrook
Copy link
Contributor Author

Also, I noticed that the cursor switches to a pointer when I hover over the table header. That should not be the case, no?

The current behavior was reversed, so that the header hada pointer and the rows did not. I fixed it so the rows had a pointer, and the header did not.

@lgtm-com
Copy link

lgtm-com bot commented Sep 30, 2021

This pull request fixes 1 alert when merging 6818e2a into ff322c4 - view on LGTM.com

fixed alerts:

  • 1 for Unused variable, import, function or class

@domoritz
Copy link
Member

domoritz commented Oct 1, 2021

Thank you. Any idea why the CI doesn't run on your fork?

@saulshanabrook
Copy link
Contributor Author

It looks like this PR isn't run either (#591). I just tried editing the github action config to run on PRs and that seems to make it work now.

Also I believe vega/vega#3237 will need to be released before this passes?

I am not sure why I have to add this, but was also getting the same error
locally that we are getting on CI about it not being installed
@lgtm-com
Copy link

lgtm-com bot commented Oct 1, 2021

This pull request fixes 1 alert when merging 658098e into ff322c4 - view on LGTM.com

fixed alerts:

  • 1 for Unused variable, import, function or class

.github/workflows/test.yml Outdated Show resolved Hide resolved
@lgtm-com
Copy link

lgtm-com bot commented Oct 2, 2021

This pull request fixes 1 alert when merging dab3c61 into 8635307 - view on LGTM.com

fixed alerts:

  • 1 for Unused variable, import, function or class

@saulshanabrook
Copy link
Contributor Author

Yeah we are now getting a test failure since the upstream vega typings PR isn't merged:

Error: src/features/dataflow/utils/runtimeToGraph.ts(17,8): error TS2307: Cannot find module 'vega-typings/types/runtime/runtime' or its corresponding type declarations.

@domoritz
Copy link
Member

Thank you for building out this feature. I'll merge this pull request when we have a typings release.

@saulshanabrook
Copy link
Contributor Author

@domoritz if we wanted to get this in before the typings are merged and released, I could comment out the typings import and alias them to any for now? Thoughts?

@domoritz
Copy link
Member

I released [email protected].

@saulshanabrook
Copy link
Contributor Author

@domoritz sweet, thank you! I will update this PR to include that release.

@lgtm-com
Copy link

lgtm-com bot commented Oct 22, 2021

This pull request fixes 1 alert when merging 5577257 into 8635307 - view on LGTM.com

fixed alerts:

  • 1 for Unused variable, import, function or class

@domoritz domoritz merged commit c0e5120 into vega:master Oct 23, 2021
domoritz added a commit that referenced this pull request Oct 23, 2021
* refactor: simplify button code by removing editor-button class

* chore(deps-dev): bump postcss from 8.3.6 to 8.3.8 (#1085)

Bumps [postcss](https://github.com/postcss/postcss) from 8.3.6 to 8.3.8.
- [Release notes](https://github.com/postcss/postcss/releases)
- [Changelog](https://github.com/postcss/postcss/blob/main/CHANGELOG.md)
- [Commits](postcss/postcss@8.3.6...8.3.8)

---
updated-dependencies:
- dependency-name: postcss
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps-dev): bump webpack from 5.53.0 to 5.56.0 (#1084)

Bumps [webpack](https://github.com/webpack/webpack) from 5.53.0 to 5.56.0.
- [Release notes](https://github.com/webpack/webpack/releases)
- [Commits](webpack/webpack@v5.53.0...v5.56.0)

---
updated-dependencies:
- dependency-name: webpack
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps-dev): bump monaco-editor-webpack-plugin from 4.1.2 to 4.2.0 (#1082)

Bumps [monaco-editor-webpack-plugin](https://github.com/Microsoft/monaco-editor-webpack-plugin) from 4.1.2 to 4.2.0.
- [Release notes](https://github.com/Microsoft/monaco-editor-webpack-plugin/releases)
- [Commits](microsoft/monaco-editor-webpack-plugin@v4.1.2...v4.2.0)

---
updated-dependencies:
- dependency-name: monaco-editor-webpack-plugin
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps): bump d3-scale from 4.0.1 to 4.0.2 (#1081)

Bumps [d3-scale](https://github.com/d3/d3-scale) from 4.0.1 to 4.0.2.
- [Release notes](https://github.com/d3/d3-scale/releases)
- [Commits](d3/d3-scale@v4.0.1...v4.0.2)

---
updated-dependencies:
- dependency-name: d3-scale
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps-dev): bump webpack-dev-server from 4.2.1 to 4.3.0 (#1079)

Bumps [webpack-dev-server](https://github.com/webpack/webpack-dev-server) from 4.2.1 to 4.3.0.
- [Release notes](https://github.com/webpack/webpack-dev-server/releases)
- [Changelog](https://github.com/webpack/webpack-dev-server/blob/master/CHANGELOG.md)
- [Commits](webpack/webpack-dev-server@v4.2.1...v4.3.0)

---
updated-dependencies:
- dependency-name: webpack-dev-server
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps): bump d3-array from 3.0.4 to 3.1.0 (#1077)

Bumps [d3-array](https://github.com/d3/d3-array) from 3.0.4 to 3.1.0.
- [Release notes](https://github.com/d3/d3-array/releases)
- [Commits](d3/d3-array@v3.0.4...v3.1.0)

---
updated-dependencies:
- dependency-name: d3-array
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps): bump @types/react from 17.0.24 to 17.0.26 (#1076)

Bumps [@types/react](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/react) from 17.0.24 to 17.0.26.
- [Release notes](https://github.com/DefinitelyTyped/DefinitelyTyped/releases)
- [Commits](https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/react)

---
updated-dependencies:
- dependency-name: "@types/react"
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps-dev): bump autoprefixer from 10.3.4 to 10.3.6 (#1080)

Bumps [autoprefixer](https://github.com/postcss/autoprefixer) from 10.3.4 to 10.3.6.
- [Release notes](https://github.com/postcss/autoprefixer/releases)
- [Changelog](https://github.com/postcss/autoprefixer/blob/main/CHANGELOG.md)
- [Commits](postcss/autoprefixer@10.3.4...10.3.6)

---
updated-dependencies:
- dependency-name: autoprefixer
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps-dev): bump @types/react-select from 4.0.17 to 5.0.1 (#1078)

* chore(deps): bump actions/setup-node from 2.4.0 to 2.4.1 (#1075)

* Add Runtime Dataflow Viewer (#1023)

Co-authored-by: chengluyu <[email protected]>
Co-authored-by: JackZ <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Saul Shanabrook <[email protected]>
Co-authored-by: chengluyu <[email protected]>
Co-authored-by: JackZ <[email protected]>
@domoritz
Copy link
Member

Thank you @saulshanabrook for the dataflow viewer!

@declann
Copy link

declann commented Oct 25, 2021

Great job @saulshanabrook , this is an awesome feature!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants