Add large blob storage to Cache #7198

lettertwo · 2021-10-27T00:12:17Z

New methods on `Cache`

hasLargeBlob(key: string): Promise<boolean>
getLargeBlob(key: string): Promise<Buffer>
setLargeBlob(key: string, contents: Buffer | string): Promise<void>

In FSCache, these are basically aliases for has(), getBlob() and setBlob(), but they use a modified key to differentiate large blobs from other cached values.

In LMDBCache, they interact with the configured filesystem rather than the LMDB store.

Caching RequestGraph as a large blob

In testing #6922 with a very large Parcel project, rebuilds would hard crash while attempting to deserialize the cached RequestGraph:

Assertion failed: (length <= imp::kMaxLength && "too large buffer"), function NewBuffer, file ../node_modules/nan/nan.h, line 890.

It seems this is because lmdb-store decodes binary data from the database into a Node Buffer which has a hard-coded upper limit on the size of the buffer for various reasons.

To work around this limit, the RequestGraph is now cached using these new large blob methods.

Caching other graphs as large blobs

Other graphs (AssetGraph, BundleGraph) are also cached as large blobs by updating the RequestTracker getRequestResult() and writeToCache() methods to use large blob caching, under the assumption that only graph requests are made with cache keys (which seems to be the mechanism by which RequestTracker decides to cache a result).

Streams

Large assets are already threaded through Parcel as streams, and now the LMDBCache also streams those values to and from the underlying FS. (Previously, streams were buffered and then written to the LMDB store).

Fallback to FS caching for large values

~~In addition to the new methods that allow explicitly caching large blobs, LMDBCache has also been updated to automatically fallback to FS cache when a value will exceed the max Node buffer size.~~ Auto fallback has been removed in favor of marking streams as large blobs and using the cache {get, set}Stream methods, with the blob methods becoming the default (previously, the stream methods were always used, even if the asset was not streamed).

Questions

What other large entities should be treated this way? AssetGraph and BundleGraph?

All the graphs are now treated as large blobs

What about large assets like video files?

large assets are now streamed in and out of the cache

height · 2021-10-27T00:12:19Z

Link Height tasks by mentioning a task ID in the pull request title or commit messages, or description and comments with the keyword link (e.g. "Link T-123").

💡Tip: You can also use "Close T-X" to automatically close a task when the pull request is merged.

parcel-benchmark · 2021-10-27T00:51:22Z

Benchmark Results

Kitchen Sink ✅

Timings

Description	Time	Difference
Cold	1.79s	-8.00ms
Cached	283.00ms	+19.00ms ⚠️

Cold Bundles

Bundle	Size	Difference	Time	Difference
dist/legacy/parcel.7cdb0fad.webp	102.94kb	+0.00b	552.00ms	+30.00ms ⚠️
dist/modern/parcel.7cdb0fad.webp	102.94kb	+0.00b	553.00ms	+32.00ms ⚠️

Cached Bundles

Bundle	Size	Difference	Time	Difference
dist/legacy/parcel.7cdb0fad.webp	102.94kb	+0.00b	521.00ms	-34.00ms 🚀

React HackerNews ✅

Timings

Description	Time	Difference
Cold	9.08s	-75.00ms
Cached	420.00ms	+21.00ms

Cold Bundles

No bundle changes detected.

Cached Bundles

No bundle changes detected.

AtlasKit Editor ✅

Timings

Description	Time	Difference
Cold	1.00m	-208.00ms
Cached	1.41s	-7.00ms

Cold Bundles

No bundle changes detected.

Cached Bundles

No bundle changes detected.

Three.js ✅

Timings

Description	Time	Difference
Cold	6.36s	+56.00ms
Cached	368.00ms	+52.00ms ⚠️

Cold Bundles

No bundle changes detected.

Cached Bundles

No bundle changes detected.

Click here to view a detailed benchmark overview.

packages/core/cache/src/LMDBCache.js

thebriando · 2021-10-27T20:27:55Z

Is there any perf difference between caching the graph to LMDB vs writing it to FS?

lettertwo · 2021-10-27T20:49:35Z

Is there any perf difference between caching the graph to LMDB vs writing it to FS?

Nothing noticeable at the macro level (build times are comparable), but maybe I can capture some startup/shutdown metrics.

lettertwo · 2021-10-29T23:40:20Z

Ran some profiling builds to see how this change might effect performance:

	Cold Build	Warm Build	Cold Start	Warm Start
serialize	-165 ms	+438 ms	+461 ms	-261 ms
deserialize	+242 ms	+382 ms	+118 ms	-98 ms
init	+1 ms	-71 ms	+1 ms	-97 ms
build	-924 ms	-2 ms	+1298 ms	+78 ms
save	-	-	-6 ms	+11 ms
shutdown	+115 ms	-162 ms	-24 ms	+365 ms
✨ Built in	-0.75 s	-0.30 s	+0.95 s	+0.07 s

The above shows the difference between the same build with and without caching the request graph as a large blob. They seem to be pretty comparable overall, with the differences between them looking similar to the differences between any two builds regardless of caching mechanism.

For context: This build typically runs in about ~135s in prod and ~70s in dev.

mischnic · 2021-11-02T19:41:00Z

LMDBCache will always write to the "real" FS(= NodeFS) anyway, so do we actually need to make the FS configurable? When would you want to use LMDB on the real FS but write the large blobs to a MemoryFS?

Also currently the change to the LMDBCache constructor would be a breaking change

mischnic · 2021-11-02T19:49:37Z

What about large assets like video files?

For that, a stream-based API is better anyway to reduce memory usage (this is why transformers, packagers and optimizers have a stream-based API in the first place). That would at the least require (s)etLargeStream methods in the cache. Not sure how straight forward it is to detected in core which method to use for large enough assets and bundles.

`LMDBCache` is only used when the FS is `NodeFS` anyway.

lettertwo · 2021-11-02T20:59:45Z

LMDBCache will always write to the "real" FS(= NodeFS) anyway, so do we actually need to make the FS configurable? When would you want to use LMDB on the real FS but write the large blobs to a MemoryFS?

Also currently the change to the LMDBCache constructor would be a breaking change

I see what you mean about not needing to configure the FS for LMDBCache (it is only used when the FS is NodeFS anyway). I updated the PR to not change the LMDBCache constructor API.

lettertwo · 2021-11-02T21:01:16Z

What about large assets like video files?

For that, a stream-based API is better anyway to reduce memory usage (this is why transformers, packagers and optimizers have a stream-based API in the first place). That would at the least require (s)etLargeStream methods in the cache. Not sure how straight forward it is to detected in core which method to use for large enough assets and bundles.

👍 For the purposes of this PR (solving the Node buffer size limit), would it make sense to just assert that any buffer being written to LMDBCache is under that size?

* v2: Add script to sync engines with core version (#7207)

lettertwo · 2021-11-03T21:29:36Z

👍 For the purposes of this PR (solving the Node buffer size limit), would it make sense to just assert that any buffer being written to LMDBCache is under that size?

I've opted instead to auto fallback to using FS cache when any blob being written to LMDBCache might exceed the Node buffer size.

This is kind of a punt on solving for large media, but currently, large media is already buffered completely into memory to be cached by LMDBCache, so this will at least let things continue to work if something very large is streamed into cache, though it doesn't deliver the memory savings that a proper stream API would allow.

* v2: Fix RangeError in `not export` error with other file type (#7295) Apply sourcemap in @parcel/transformer-typescript-tsc (#7287) Fix side effects glob matching (#7288) Fix changelog headings v2.0.1 Changelog for v2.0.1 Resolve GLSL relative to the importer, not the asset (#7263) fix: add @parcel/diagnostic as dependency of @parcel/transformer-typescript-types (#7248) Fixed missing "Parcel" export member in Module "@parcel/core" (#7250)

packages/core/cache/src/LMDBCache.js

Before, all assets were streamed from cache regardless of size, but by marking assets with content streams as large blobs when being written to the cache, we can default to reading the assets into memory from cache, and only stream the assets that were marked as large blobs.

lettertwo · 2021-11-18T00:59:53Z

I've opted instead to auto fallback to using FS cache when any blob being written to LMDBCache might exceed the Node buffer size.

I reverted this fallback behavior. Now, LMDBCache uses FS read/write streams instead of converting between stream and buffer, and I added some isLargeBlob signals for cases where streams are being written to cache, which is then used to read them from cache as streams.

So, large assets should now be streamed to/from cache rather than being buffered in memory.

* upstream/v2: Upgrade Flow to 0.164.0 (#7297) Chore: fix typo initialise -> initialize in core/Parcel.js (#7309) Register symbols even without scopehoisting (#7222)

devongovett

Looks good. Going to perf test it later

packages/core/cache/src/LMDBCache.js

This prevents `FSCache.has` and `FSCache.get` from finding large blobs, as large blobs should be retrieved via `FSCache.getLargeBlob` or `FSCache.getStream`.

packages/core/cache/src/LMDBCache.js

* upstream/v2: Fix `semver` version range (#7334)

devongovett · 2021-11-23T02:30:09Z

Tested and saw no perf regression anymore.

lettertwo added 2 commits October 26, 2021 18:41

Add large blob methods to cache

f99e407

Cache request graph as large blob

f6c22d7

mischnic reviewed Oct 27, 2021

View reviewed changes

packages/core/cache/src/LMDBCache.js Outdated Show resolved Hide resolved

lettertwo added 2 commits October 27, 2021 12:33

Fix LMDBCache serialization

8f0a103

Prepare LMDBCache fs for serialization

aec63a8

lettertwo marked this pull request as ready for review October 29, 2021 23:17

Merge branch 'v2' into lettertwo/cache-large-blob

2189909

thebriando approved these changes Nov 1, 2021

View reviewed changes

lettertwo added 2 commits November 2, 2021 15:23

Cache graph request results as large blobs

6681eef

Merge branch 'v2' into lettertwo/cache-large-blob

9fa5e09

lettertwo requested review from mischnic and devongovett November 2, 2021 19:38

Revert configurable LMDBCache FS

6c88b5d

`LMDBCache` is only used when the FS is `NodeFS` anyway.

lettertwo added 3 commits November 3, 2021 16:38

Fall back to FS for large blobs in LMDBCache

74b4dd4

Merge branch 'v2' into lettertwo/cache-large-blob

28c5b66

* v2: Add script to sync engines with core version (#7207)

Remove stale value when auto storing a large blob

5734312

lettertwo and others added 3 commits November 3, 2021 17:33

Stream large blobs out of fs cache

da0a359

Merge branch 'v2' into lettertwo/cache-large-blob

e13e8d0

devongovett reviewed Nov 16, 2021

View reviewed changes

packages/core/cache/src/LMDBCache.js Outdated Show resolved Hide resolved

lettertwo added 4 commits November 16, 2021 17:05

Revert fall back to FS for large blobs in LMDBCache

a53b39b

Prevent large blobs from being stored in cache

ce8fe8e

Set/get streams to/from FS in LMDBCache

6d9f276

lettertwo requested a review from devongovett November 18, 2021 01:02

Merge branch 'v2' into lettertwo/cache-large-blob

50dfd3b

* upstream/v2: Upgrade Flow to 0.164.0 (#7297) Chore: fix typo initialise -> initialize in core/Parcel.js (#7309) Register symbols even without scopehoisting (#7222)

devongovett approved these changes Nov 18, 2021

View reviewed changes

packages/core/cache/src/LMDBCache.js Outdated Show resolved Hide resolved

lettertwo added 2 commits November 18, 2021 12:38

Remove default empty stream from getStream

0776a9e

Differentiate large blobs in FSCache

d9fc18f

This prevents `FSCache.has` and `FSCache.get` from finding large blobs, as large blobs should be retrieved via `FSCache.getLargeBlob` or `FSCache.getStream`.

lettertwo requested a review from devongovett November 18, 2021 19:06

Fix type error

ed1b602

lettertwo commented Nov 19, 2021

View reviewed changes

packages/core/cache/src/LMDBCache.js Outdated Show resolved Hide resolved

lettertwo added 2 commits November 22, 2021 13:14

Merge branch 'v2' into lettertwo/cache-large-blob

88b1e22

* upstream/v2: Fix `semver` version range (#7334)

Remove vestigial isLargeBlob check

3f3c831

devongovett approved these changes Nov 23, 2021

View reviewed changes

devongovett merged commit cce9e0c into v2 Nov 23, 2021

devongovett deleted the lettertwo/cache-large-blob branch November 23, 2021 02:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add large blob storage to Cache #7198

Add large blob storage to Cache #7198

lettertwo commented Oct 27, 2021 •

edited

Loading

height bot commented Oct 27, 2021

parcel-benchmark commented Oct 27, 2021 •

edited

Loading

Timings

Cold Bundles

Cached Bundles

Timings

Cold Bundles

Cached Bundles

Timings

Cold Bundles

Cached Bundles

Timings

Cold Bundles

Cached Bundles

thebriando commented Oct 27, 2021

lettertwo commented Oct 27, 2021

lettertwo commented Oct 29, 2021 •

edited

Loading

mischnic commented Nov 2, 2021 •

edited

Loading

mischnic commented Nov 2, 2021

lettertwo commented Nov 2, 2021

lettertwo commented Nov 2, 2021

lettertwo commented Nov 3, 2021

lettertwo commented Nov 18, 2021

devongovett left a comment

devongovett commented Nov 23, 2021

Add large blob storage to Cache #7198

Add large blob storage to Cache #7198

Conversation

lettertwo commented Oct 27, 2021 • edited Loading

New methods on Cache

Caching RequestGraph as a large blob

Caching other graphs as large blobs

Streams

Fallback to FS caching for large values

Questions

height bot commented Oct 27, 2021

parcel-benchmark commented Oct 27, 2021 • edited Loading

Benchmark Results

Timings

Cold Bundles

Cached Bundles

Timings

Cold Bundles

Cached Bundles

Timings

Cold Bundles

Cached Bundles

Timings

Cold Bundles

Cached Bundles

thebriando commented Oct 27, 2021

lettertwo commented Oct 27, 2021

lettertwo commented Oct 29, 2021 • edited Loading

mischnic commented Nov 2, 2021 • edited Loading

mischnic commented Nov 2, 2021

lettertwo commented Nov 2, 2021

lettertwo commented Nov 2, 2021

lettertwo commented Nov 3, 2021

lettertwo commented Nov 18, 2021

devongovett left a comment

Choose a reason for hiding this comment

devongovett commented Nov 23, 2021

lettertwo commented Oct 27, 2021 •

edited

Loading

New methods on `Cache`

parcel-benchmark commented Oct 27, 2021 •

edited

Loading

lettertwo commented Oct 29, 2021 •

edited

Loading

mischnic commented Nov 2, 2021 •

edited

Loading