[DataGrid] WASM-powered quick filter #9206

romgrk · 2023-06-03T18:34:40Z

This PR is a POC to test quick-filtering the grid with WASM.

Demo: https://deploy-preview-9206--material-ui-x.netlify.app/x/react-data-grid/filtering/quick-filter/#wasm-demo

Pros

It is fast. The non-SIMD version can full-text search 100k rows in 30-40ms. The SIMD version in about 5-15ms, which is enough to search-as-you-type.

Cons

1. Users can't run custom quick filter functions.

But they can define the .valueFormatter to set what is going to be searched.

2. It has a big upfront cost.

We need to fill a buffer with the text before passing it to wasm, computing the buffer currently takes 2,500ms, which is not good for UX. You may feel the UI thread blocked while you load the demo, this is the reason.

Mitigations

Part of this would be alleviated by the new & more efficient API from #9120. Preliminary results show that the upfront cost could go down to 500-1000ms with more optimized APIs.
We can combine that with an approach like React, where we schedule small chunks of work to run between browser frames. This would allow the buffer creation to feel seamless for users.
This feature should also be opt-in, doing this much work only makes sense for specific use-cases.

Additional info

If filling a buffer from scratch takes 500-1000ms, I estimate that recomputing a search buffer after a cell edit should take from 300-500ms.

oliviertassinari · 2023-06-03T22:03:10Z

Interesting, WASM reminds me of this part: https://www.coss.community/cossc/ocs-2020-breakout-niall-crosby-17m9, at 30:00. I took notes about this in https://www.notion.so/mui-org/AG-Grid-Niall-Crosby-9c34589a63294dd1a1dbabd843bb24ff?pvs=4#9393ec6e2fc74299ac64b49355920476. I think worth a watch.

I doubt that WASM will make a difference, one hypothesis is that developers won't be able to load enough data client-side to start to see the limit where the JS-based search is too slow.

Maybe, is there a way to demo how this can be done in userland?

romgrk · 2023-06-04T02:35:54Z

WASM reminds me of this part

I agree with some/most of the points, but I think he's talking about migrating "the grid" to wasm, which would indeed be a bad idea. The difference is here is we're solving a specific problem for which wasm is good at.

one hypothesis is that developers won't be able to load enough data client-side to start to see the limit where the JS-based search is too slow.

Do we have data & use-case studies about how the grid is being used? It's for sure not the ideal model to have everything client side, but I can think of a few use-cases where it could make sense (e.g no server, PWA with local file access). I've also seen users in the github issues discussion mentionning performance issues with the quickfilter (QF).

TBH our current QF implementation is slow first because we re-generate the rows' formatted values on each filter round. This PR moves that cost to the start, and formatting the values is the big majority of the 2000ms cost mentionned above. So there is room for improvement for JS, but there are also limits.
In particular, to be as efficient JS would need to stop using JS-strings, because those have awful cache-locality. Being able to access memory sequentially in one continuous block is a huge advantage. This PR encodes the data as strings packed one after the other like this: [length: 4 bytes] + [...data: length bytes].
The other big advantage that wasm has is SIMD instructions support; we get a 2-8x speedup when using SIMD.

Maybe, is there a way to demo how this can be done in userland?

There is for sure a way to re-package this though. We could decouple quickfiltering into a module with a well-defined interface so that it's easily possible to replace the QF implementation with a different one; this could be an interesting feature for some users as well.
The wasm/native part of this PR could also additionally be split into its own package, the core of it is a full-text search engine that receives a list of strings as preparation data, and then provides the ability to query that data for matching lines (aka rows) indexes. There's a few fulltext search packages on NPM, some with wasm but none is SIMD enabled.

oliviertassinari · 2023-06-04T13:43:03Z

I've also seen users in the github issues discussion mentionning performance issues with the quickfilter (QF).

This sounds like a great point to dive deeper into. If they can provide a real-life reproduction, it would be great. Maybe what they are truly complaining about in this discussion is the 500ms debounce in the textbox 🤷‍♂️.

It has a big upfront cost.

I think that we need to find a solution for the data loading, with an x4 CPU throttle (an average device?), on my M1 Pro it takes 13s to load. What will happen as users dynamically change the rows 🙃?

SIMD enabled

Pushing the idea further, Is there a chance we could run this in a GPU with WebGL? https://stackoverflow.com/questions/27333815/cpu-simd-vs-gpu-simd

There is for sure a way to re-package this though.

I was thinking along the lines of using an npm package that focuses on this problem and shows how to do the integration.
But it could also surface the need for a plugin system, so we don't force this bundle size on all the people who don't need it.

our current QF implementation is slow first because we re-generate the rows' formatted values on each filter round.

Caching sounds interesting, I imagine it could benefit both a JS and WASM implementation.

romgrk · 2023-06-05T04:31:20Z

Caching sounds interesting, I imagine it could benefit both a JS and WASM implementation.

For sure. I'll note that when I benchmarked, formatting the cell values was the highest cost: about 1800-2000ms for formatting the rows, and about 200-300ms to generate the input buffer. So the big problem is row formatting, which needs to be solved even for a pure JS quickfilter. I've established while profiling filter performance that some of our API calls are very expensive, in particular .getCellParams() and .getCellValue(). If we follow the same approaches started in #9120, I think we can cut that by 2 or 3. For the rest, we could look into:

Idle callback work (or just in-between frames work like React does, not necessarily via requestIdleCallback)
Worker threads

What will happen as users dynamically change the rows?

For row updates: if we preserve the formatted row values in memory (and trade memory for CPU), we can update the input buffer very quickly, we just need to memcpy the part of the buffer after the affected row further away in memory, and then rewrite only the affected row. If we don't preserve the memory, then it's the same cost as creating the buffer from scratch.

I think if we go forward with this, all of those choices should be configurable by library users. They're the only ones who could pick the right combination of trade-offs for their use-case.

Pushing the idea further, Is there a chance we could run this in a GPU with WebGL?

I have very limited experience with using GPU. I'm not sure how well a string finding algorithm translates to shader code. I would need to do research to answer that question.

github-actions · 2023-06-28T20:28:22Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

romgrk added 2 commits June 3, 2023 14:08

feat: wasm filter

56f5ec9

lint

bf0116c

romgrk added performance component: data grid This is the name of the generic UI component, not the React module! labels Jun 3, 2023

romgrk added 2 commits June 3, 2023 14:42

lint

09f4947

lint

2d2b3f4

romgrk mentioned this pull request Jun 4, 2023

[DataGrid] Indicate internal loading state #8197

Open

2 tasks

romgrk requested review from cherniavskii and MBilalShafi June 5, 2023 10:44

github-actions bot added the PR: out-of-date The pull request has merge conflicts and can't be merged label Jun 28, 2023

michelengelen changed the base branch from master to next November 6, 2023 14:27

MBilalShafi changed the base branch from next to master March 21, 2024 02:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DataGrid] WASM-powered quick filter #9206

[DataGrid] WASM-powered quick filter #9206

romgrk commented Jun 3, 2023 •

edited

Loading

oliviertassinari commented Jun 3, 2023 •

edited

Loading

romgrk commented Jun 4, 2023 •

edited

Loading

oliviertassinari commented Jun 4, 2023 •

edited

Loading

romgrk commented Jun 5, 2023

github-actions bot commented Jun 28, 2023

[DataGrid] WASM-powered quick filter #9206

Are you sure you want to change the base?

[DataGrid] WASM-powered quick filter #9206

Conversation

romgrk commented Jun 3, 2023 • edited Loading

Pros

Cons

1. Users can't run custom quick filter functions.

2. It has a big upfront cost.

Mitigations

Additional info

oliviertassinari commented Jun 3, 2023 • edited Loading

romgrk commented Jun 4, 2023 • edited Loading

oliviertassinari commented Jun 4, 2023 • edited Loading

romgrk commented Jun 5, 2023

github-actions bot commented Jun 28, 2023

romgrk commented Jun 3, 2023 •

edited

Loading

oliviertassinari commented Jun 3, 2023 •

edited

Loading

romgrk commented Jun 4, 2023 •

edited

Loading

oliviertassinari commented Jun 4, 2023 •

edited

Loading