-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add large blob storage to Cache #7198
Conversation
|
Benchmark ResultsKitchen Sink ✅
Timings
Cold Bundles
Cached Bundles
React HackerNews ✅
Timings
Cold BundlesNo bundle changes detected. Cached BundlesNo bundle changes detected. AtlasKit Editor ✅
Timings
Cold BundlesNo bundle changes detected. Cached BundlesNo bundle changes detected. Three.js ✅
Timings
Cold BundlesNo bundle changes detected. Cached BundlesNo bundle changes detected. |
Is there any perf difference between caching the graph to LMDB vs writing it to FS? |
Nothing noticeable at the macro level (build times are comparable), but maybe I can capture some startup/shutdown metrics. |
Ran some profiling builds to see how this change might effect performance:
The above shows the difference between the same build with and without caching the request graph as a large blob. They seem to be pretty comparable overall, with the differences between them looking similar to the differences between any two builds regardless of caching mechanism. For context: This build typically runs in about ~135s in prod and ~70s in dev. |
Also currently the change to the |
For that, a stream-based API is better anyway to reduce memory usage (this is why transformers, packagers and optimizers have a stream-based API in the first place). That would at the least require |
`LMDBCache` is only used when the FS is `NodeFS` anyway.
I see what you mean about not needing to configure the FS for |
👍 For the purposes of this PR (solving the Node buffer size limit), would it make sense to just assert that any buffer being written to LMDBCache is under that size? |
* v2: Add script to sync engines with core version (#7207)
I've opted instead to auto fallback to using FS cache when any blob being written to This is kind of a punt on solving for large media, but currently, large media is already buffered completely into memory to be cached by |
* v2: Fix RangeError in `not export` error with other file type (#7295) Apply sourcemap in @parcel/transformer-typescript-tsc (#7287) Fix side effects glob matching (#7288) Fix changelog headings v2.0.1 Changelog for v2.0.1 Resolve GLSL relative to the importer, not the asset (#7263) fix: add @parcel/diagnostic as dependency of @parcel/transformer-typescript-types (#7248) Fixed missing "Parcel" export member in Module "@parcel/core" (#7250)
Before, all assets were streamed from cache regardless of size, but by marking assets with content streams as large blobs when being written to the cache, we can default to reading the assets into memory from cache, and only stream the assets that were marked as large blobs.
I reverted this fallback behavior. Now, LMDBCache uses FS read/write streams instead of converting between stream and buffer, and I added some So, large assets should now be streamed to/from cache rather than being buffered in memory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Going to perf test it later
This prevents `FSCache.has` and `FSCache.get` from finding large blobs, as large blobs should be retrieved via `FSCache.getLargeBlob` or `FSCache.getStream`.
* upstream/v2: Fix `semver` version range (#7334)
Tested and saw no perf regression anymore. |
New methods on
Cache
hasLargeBlob(key: string): Promise<boolean>
getLargeBlob(key: string): Promise<Buffer>
setLargeBlob(key: string, contents: Buffer | string): Promise<void>
In
FSCache
, these are basically aliases forhas()
,getBlob()
andsetBlob()
, but they use a modified key to differentiate large blobs from other cached values.In
LMDBCache
, they interact with the configured filesystem rather than the LMDB store.Caching RequestGraph as a large blob
In testing #6922 with a very large Parcel project, rebuilds would hard crash while attempting to deserialize the cached
RequestGraph
:It seems this is because lmdb-store decodes binary data from the database into a Node Buffer which has a hard-coded upper limit on the size of the buffer for various reasons.
To work around this limit, the
RequestGraph
is now cached using these new large blob methods.Caching other graphs as large blobs
Other graphs (
AssetGraph
,BundleGraph
) are also cached as large blobs by updating theRequestTracker
getRequestResult()
andwriteToCache()
methods to use large blob caching, under the assumption that only graph requests are made with cache keys (which seems to be the mechanism by whichRequestTracker
decides to cache a result).Streams
Large assets are already threaded through Parcel as streams, and now the
LMDBCache
also streams those values to and from the underlying FS. (Previously, streams were buffered and then written to the LMDB store).Fallback to FS caching for large values
In addition to the new methods that allow explicitly caching large blobs, LMDBCache has also been updated to automatically fallback to FS cache when a value will exceed the max Node buffer size.Auto fallback has been removed in favor of marking streams as large blobs and using the cache{get, set}Stream
methods, with the blob methods becoming the default (previously, the stream methods were always used, even if the asset was not streamed).Questions