Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making it possible to debug and introspect data transformations #4134

Closed
tmcw opened this issue Aug 17, 2018 · 4 comments
Closed

Making it possible to debug and introspect data transformations #4134

tmcw opened this issue Aug 17, 2018 · 4 comments

Comments

@tmcw
Copy link
Contributor

tmcw commented Aug 17, 2018

It seems like the idea of transforms built into vega-lite is that you might be able to use a wider range of input data without writing custom JavaScript, by using the little transform functions. Which are cool, but - given that what you have then is something like data → (transformed data) → visualization result, afaict there's no way to take a peek at the in-between transformed data.

Looking at the transformed data - as it stands - seems quite difficult, for a number of reasons. To get the viewpoint, you need to:

  1. Write a vega-lite schema
  2. Compile it to vega
  3. Run it with vega
  4. Call .data(name) on the produced chart

This is tricky because:

  • The vega-lite documentation, unlike the vega documentation, doesn't have a Debugging section. You should use the vega debugging section with vega-lite, but it isn't directly linked. The relationship of the spec and implementations is, well, rather confusing.
  • Vega-lite specs don't necessarily name their datasets, but vega requires you to provide a name when you call .data(name). Calling .data() throws an error. The only way to get this to work, afaict, is to look at the intermediate compiled vega spec, which, if you've been living on the vega-lite abstraction level, might not be very familiar. The dataset names are also not guessable, one cannot run, for example, .data(0)

What might make this better:

  • Documentation that makes the relationship between these tools much clearer, and treats visualization as a debugging exercise. My experience and what I've heard from others is that, well, debugging is currently a real hard part of the Vega ecosystem, and it doesn't have to be that way.
  • Maybe a debug output mode for vega-lite?
@domoritz
Copy link
Member

Thank you Tom for writing up these challenges that users face when debugging Vega-Lite!

The related issue on naming datasets is #3789.

@domoritz
Copy link
Member

domoritz commented Oct 3, 2018

The Vega-Editor now has a data viewer, which addresses some of the points you mention. The next step should be more thorough documentation of how Vega-Lite is translated to Vega and a debugging guide.

@onetom
Copy link

onetom commented Apr 29, 2020

I'm struggling with this too.

After seeing that I can name my data sources and after looking at generated vega specs, I was guessing that I could just name transforms in vega-lite too, like:

{
  "$schema": "https://vega.github.io/schema/vega-lite/v4.json",
  "data": {
    "values": [
      {"key": "alpha", "foo": [1, 2], "bar": ["A", "B"]},
      {"key": "beta", "foo": [3, 4, 5], "bar": ["C", "D"]}
    ]
  },
  "transform": [{"flatten": ["foo", "bar"], "name": "with_bar_flattened"}],
  "mark": "circle",
  "encoding": {
    "x": {"field": "foo", "type": "quantitative"},
    "y": {"field": "bar", "type": "nominal"},
    "color": {"field": "key", "type": "nominal"}
  }
}

So in the vega editor, I would see with_bar_flattened instead of the current data_0.

I'm learning vega-lite through https://github.com/metasoarous/oz though, so I'm hoping I can figure out some way to conveniently debug from a REPL. I haven't looked into https://github.com/jsa-aerial/hanami yet; that looks promising too.

@domoritz
Copy link
Member

Thank you for the feedback @onetom. Unfortunately, there is no one to one correspondence between transforms and Vega datasets. Multiple transforms can appear in a single Vega dataset. So naming datasets this way is not possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants