Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parcoords multiselection #2331

Closed
monfera opened this issue Feb 4, 2018 · 9 comments
Closed

Parcoords multiselection #2331

monfera opened this issue Feb 4, 2018 · 9 comments
Assignees

Comments

@monfera
Copy link
Contributor

monfera commented Feb 4, 2018

Currently, parcoords permits the selection of one contiguous range within each dimension (via brushing, or via an API call). We're working on support for multiselections, ie. an OR relation among multiple ranges within a specific dimension. Also, any of the 60+ dimensions could have multiple selections. The goal is to retain as much speed as feasible.

Currently, a promising route for the multi-selection is the use of a pixel-based bitmap (likely via textures) for each dimension. Basically, instead of arbitrary floating-point(*) ranges, filtering would have no higher resolution than what can be selected on the screen, or discerned as lines, ie. the pixel raster. A mechanical interpretation is that brushing would turn pixels on/off, enabling or disabling lines that go through that pixel. This is sufficient for mouse based interactions, but in theory(**), a floating-point range can be more accurate, allowing subpixel selection via the API.

(*) (**) the 32 bit floating points are effectively 23 bit integers in this regard, which is quite a bit more than the (for example) 12 bit precision (=4096) that characterizes a 4k tall parcoords, or the 8-10 bit precision on views of regular height eg. on a dashboard. On the other hand, parcoords is an aggregate view where going more accurate than a pixel limit may not be useful even through applying the constraints via the API. So this approach is workable iff the visible raster is deemed sufficient as constraint resolution via the API. (It's not possible to go subpixel level with the mouse, unless with browser zooming or similar.)

The motive is that the vertical height of the parcoords is bounded by reasonable monitor resolutions, and WebGL gl.MAX_TEXTURE_SIZE is at least 4096, and gl.MAX_TEXTURE_IMAGE_UNITS is at least 8, and each texture pixel can be up to 4 bytes (64 bits) so even in the worst case, there's plenty of room for a full-height 4k monitor display and our almost 64 dimensions. The resulting vertex shader algo is a simple texture lookup, and given that 64 bits make up the texture pixels and we have almost 64 dimensions, it could reduce to simple bit fiddling operations.

There are alternatives, but as this approach is promising and would have fairly bounded runtime (to be tested), and represents no limit on how many ranges can be selected, we should discuss if this is an acceptable solution, before enlisting more involved alternatives.

@alexcjohnson
Copy link
Collaborator

Would this replace the existing system entirely, or only be used on multirange selection axes while the existing system continues to be used for a single selection range?

Your proposal is totally fine for UI-driven selection (which is 99% of the usage anyway, right?), as well as API selection on categorical axes. The only case that bothers me a little is API selection on continuous axes, as it would be possible to include or exclude points near the edge improperly. But if we could keep the existing higher-precision version on single-range axes - which is normally all you use on continuous axes anyway - and use the pixel-based version only for the axes using multirange selection, I'd have no qualms about it.

That said it's a fairly unusual case and a fairly minor effect I'm discussing, so if it's not possible or too big a performance hit to do both, I'd also be OK with just explicitly documenting the limitation.

@monfera
Copy link
Contributor Author

monfera commented Feb 5, 2018

Keeping both modes is possible: w'd preserve what we have now and add the raster based filtering in an AND relationship. We might be able to even avoid a texture lookup for those dimensions that don't need it, although the GPU is often faster if we just let it do superfluous parallel work if it helps avoid conditional branching. If the bitmap raster is fast enough on its own, then it'll be fast enough when intersecting it with what we have now. All this (multidimensional) point selection logic is happening in the vertex shader, and our bottleneck is the fragment shader anyway.

@phxnsharp
Copy link

This all seems reasonable for our use case. We will, of course, specify the categorical variable ranges using precise lists of included values. Having 1k+ categorical values is pretty unusual in the first place, and some minor imprecision at the pixel level is to be expected at that point.

Just to note, we do not have a use case for disjoint ranges for continuous variables. Only for categorical variables.

@monfera
Copy link
Contributor Author

monfera commented Feb 14, 2018

@phxnsharp thank you for your comments! Regarding categorical variables, I'm not sure if it'd allow us to further increase accuracy, but maybe good to know this: are your categorical values spaced evenly on the axis? It is possible for categorical variables to be placed with an uneven cadence, eg. here the offset between A and AB, or AB and B are smaller than between eg. B and Y but I suspect that ticks are usually or always of the same distance apart on a given axis. With a uniform cadence, there's less likelihood that several categorical ticks crowd under the same pixel.

@phxnsharp
Copy link

@monfera Unfortunately they can be unevenly spaced. Thanks for asking!

@monfera
Copy link
Contributor Author

monfera commented Feb 15, 2018

@phxnsharp thanks! Clumping means that there is increased chance that more than one categorical value would fall under the same pixel, with the consequence that, even if there's a precise list of included values to highlight, some additional values may also become included. For example, if values 543 and 544 are under the same pixel, and value 543 is specified on the list, then value 544 would light up too, and the other way around. Please let us know if it can be a problem.

@phxnsharp
Copy link

@monfera My team did come up with some unlikely, though bad, cases that the interference could cause. For example, imagine a design set with two enumerated variables v1 and v2. v1 has a constraint that limits the dataset. It turns out that with that constraint, all the valid points go through a particular value of v2. However, if due to pixel interference some additional designs are drawn as valid, the engineer might make invalid conclusions if those points go through other values of v2.

This is not so likely that it necessarily is an impediment to the suggested solution. Let me propose a counter solution, though. We have an upper limit of 1000 discrete values for any enumerated variable. Can you map the enumerated values to an integer between 0 and 999, then map that integer to pixels on the screen? If so, you could do the filtering based off of the integer, which is guaranteed to not overlap.

@monfera
Copy link
Contributor Author

monfera commented Feb 15, 2018

@phxnsharp thanks for the added info. We've been working on the pixel mask on the GPU on which things like lookup masks (conveniently, at pixel pitch, but can be sparser or denser up to 4k evenly spaced bins) are fairly straightforward and are of O(1) complexity, but relation joins aren't.

What I mean by joins is that the requirement is analogous to joining a data relation (table with possibly 10k or more rows per dimension, up to ~60 dimensions) with a filter relation (up to 1000 values ie. records per dimension), and performing such joins requires operations (merge join, hash join B-tree index lookup) that aren't a good fit for WebGL - such operations are sometimes done in a GPGPU setting with CUDA or OpenCL, but WebGL has no compute shaders so we're resorted to using vertex and pixel shaders. For example, in WebGL even a for loop where the iteration count is not known in advance is tricky, because the iteration count needs to be known at compile time, and setting a too high value for the worst case means that its performance will always reflect that.

We'll do some brainstorming though to see how it could be solved, given the specifics that only ordinal variables need to be filtered, and with up to 1000 values per dimension. A promising direction is to bijectively map the categorical values of possibly uneven cadence [x0, .., xN] to integers [0, ..., N], and bring back the "distortion" via the inverse mapping (via a new lookup table) on the GPU for rendering. The benefit is that the filter could use the integer grid, so another lookup table (bitmap) could be indexed into to check if the value is selected. While it looks feasible, it's computationally more involved than the pixel based approach because it's two more lookups than what we have now and we may need to do it for all dimensions, categorical or not. We may have to bake in the ~1000 (eg. 1024) limit for categorical values which doesn't sound like a big loss. We'll check if there are enough GPU resources for this approach to run on most or all computers to avoid relying on resource limits such as allowed texture count which aren't well supported by either the standard or the overwhelming majority of hardware makers.

@etpinard
Copy link
Contributor

Done (at least the most part) in #2415

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants