Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DataGrid] WASM-powered quick filter #9206

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

romgrk
Copy link
Contributor

@romgrk romgrk commented Jun 3, 2023

This PR is a POC to test quick-filtering the grid with WASM.

Demo: https://deploy-preview-9206--material-ui-x.netlify.app/x/react-data-grid/filtering/quick-filter/#wasm-demo

Pros

It is fast. The non-SIMD version can full-text search 100k rows in 30-40ms. The SIMD version in about 5-15ms, which is enough to search-as-you-type.

Cons

1. Users can't run custom quick filter functions.

But they can define the .valueFormatter to set what is going to be searched.

2. It has a big upfront cost.

We need to fill a buffer with the text before passing it to wasm, computing the buffer currently takes 2,500ms, which is not good for UX. You may feel the UI thread blocked while you load the demo, this is the reason.

Mitigations

Part of this would be alleviated by the new & more efficient API from #9120. Preliminary results show that the upfront cost could go down to 500-1000ms with more optimized APIs.
We can combine that with an approach like React, where we schedule small chunks of work to run between browser frames. This would allow the buffer creation to feel seamless for users.
This feature should also be opt-in, doing this much work only makes sense for specific use-cases.

Additional info

If filling a buffer from scratch takes 500-1000ms, I estimate that recomputing a search buffer after a cell edit should take from 300-500ms.

@romgrk romgrk added performance component: data grid This is the name of the generic UI component, not the React module! labels Jun 3, 2023
@oliviertassinari
Copy link
Member

oliviertassinari commented Jun 3, 2023

Interesting, WASM reminds me of this part: https://www.coss.community/cossc/ocs-2020-breakout-niall-crosby-17m9, at 30:00. I took notes about this in https://www.notion.so/mui-org/AG-Grid-Niall-Crosby-9c34589a63294dd1a1dbabd843bb24ff?pvs=4#9393ec6e2fc74299ac64b49355920476. I think worth a watch.

I doubt that WASM will make a difference, one hypothesis is that developers won't be able to load enough data client-side to start to see the limit where the JS-based search is too slow.

Maybe, is there a way to demo how this can be done in userland?

@romgrk
Copy link
Contributor Author

romgrk commented Jun 4, 2023

WASM reminds me of this part

I agree with some/most of the points, but I think he's talking about migrating "the grid" to wasm, which would indeed be a bad idea. The difference is here is we're solving a specific problem for which wasm is good at.

one hypothesis is that developers won't be able to load enough data client-side to start to see the limit where the JS-based search is too slow.

Do we have data & use-case studies about how the grid is being used? It's for sure not the ideal model to have everything client side, but I can think of a few use-cases where it could make sense (e.g no server, PWA with local file access). I've also seen users in the github issues discussion mentionning performance issues with the quickfilter (QF).

TBH our current QF implementation is slow first because we re-generate the rows' formatted values on each filter round. This PR moves that cost to the start, and formatting the values is the big majority of the 2000ms cost mentionned above. So there is room for improvement for JS, but there are also limits.
In particular, to be as efficient JS would need to stop using JS-strings, because those have awful cache-locality. Being able to access memory sequentially in one continuous block is a huge advantage. This PR encodes the data as strings packed one after the other like this: [length: 4 bytes] + [...data: length bytes].
The other big advantage that wasm has is SIMD instructions support; we get a 2-8x speedup when using SIMD.

Maybe, is there a way to demo how this can be done in userland?

There is for sure a way to re-package this though. We could decouple quickfiltering into a module with a well-defined interface so that it's easily possible to replace the QF implementation with a different one; this could be an interesting feature for some users as well.
The wasm/native part of this PR could also additionally be split into its own package, the core of it is a full-text search engine that receives a list of strings as preparation data, and then provides the ability to query that data for matching lines (aka rows) indexes. There's a few fulltext search packages on NPM, some with wasm but none is SIMD enabled.

@oliviertassinari
Copy link
Member

oliviertassinari commented Jun 4, 2023

I've also seen users in the github issues discussion mentionning performance issues with the quickfilter (QF).

This sounds like a great point to dive deeper into. If they can provide a real-life reproduction, it would be great. Maybe what they are truly complaining about in this discussion is the 500ms debounce in the textbox 🤷‍♂️.

It has a big upfront cost.

I think that we need to find a solution for the data loading, with an x4 CPU throttle (an average device?), on my M1 Pro it takes 13s to load. What will happen as users dynamically change the rows 🙃?

SIMD enabled

Pushing the idea further, Is there a chance we could run this in a GPU with WebGL? https://stackoverflow.com/questions/27333815/cpu-simd-vs-gpu-simd

There is for sure a way to re-package this though.

I was thinking along the lines of using an npm package that focuses on this problem and shows how to do the integration.
But it could also surface the need for a plugin system, so we don't force this bundle size on all the people who don't need it.

our current QF implementation is slow first because we re-generate the rows' formatted values on each filter round.

Caching sounds interesting, I imagine it could benefit both a JS and WASM implementation.

@romgrk
Copy link
Contributor Author

romgrk commented Jun 5, 2023

Caching sounds interesting, I imagine it could benefit both a JS and WASM implementation.

For sure. I'll note that when I benchmarked, formatting the cell values was the highest cost: about 1800-2000ms for formatting the rows, and about 200-300ms to generate the input buffer. So the big problem is row formatting, which needs to be solved even for a pure JS quickfilter. I've established while profiling filter performance that some of our API calls are very expensive, in particular .getCellParams() and .getCellValue(). If we follow the same approaches started in #9120, I think we can cut that by 2 or 3. For the rest, we could look into:

  • Idle callback work (or just in-between frames work like React does, not necessarily via requestIdleCallback)
  • Worker threads

What will happen as users dynamically change the rows?

For row updates: if we preserve the formatted row values in memory (and trade memory for CPU), we can update the input buffer very quickly, we just need to memcpy the part of the buffer after the affected row further away in memory, and then rewrite only the affected row. If we don't preserve the memory, then it's the same cost as creating the buffer from scratch.

I think if we go forward with this, all of those choices should be configurable by library users. They're the only ones who could pick the right combination of trade-offs for their use-case.

Pushing the idea further, Is there a chance we could run this in a GPU with WebGL?

I have very limited experience with using GPU. I'm not sure how well a string finding algorithm translates to shader code. I would need to do research to answer that question.

@github-actions github-actions bot added the PR: out-of-date The pull request has merge conflicts and can't be merged label Jun 28, 2023
@github-actions
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@michelengelen michelengelen changed the base branch from master to next November 6, 2023 14:27
@MBilalShafi MBilalShafi changed the base branch from next to master March 21, 2024 02:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: data grid This is the name of the generic UI component, not the React module! performance PR: out-of-date The pull request has merge conflicts and can't be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants