Skip to content

Commit

Permalink
Send fetch requests for all page dict lookups in parallel
Browse files Browse the repository at this point in the history
- When adding page dict candidates to the lookup tree, also initiate fetching them from xref, so if they are not yet loaded at all, the XHR will be sent
 - Only at the top level - assume that if there is a /Pages tree, it is sensibly structured and the number of requests won't be too bad
- We can then await on the cached Promise without making the requests pipeline
- This has a significant performance improvement for load-on-demand (i.e. with auto-fetch turned off) when a PDF has a large number of pages in the top level /Pages collection, and those pages are spread through a file, so every candidate needs to be fetched separately
 - PDFs with many pages where each page is a big image and all the pages are at the top level are quite a common output for digitisation programmes
- I would have liked to do something like "if it's the top level collection and page count = number of kids, then just fetch that page without traversing the tree" but unfortunately I agree with comments on #8088 that there is no good general solution to allow for /Pages nodes with empty /Kids arrays
- The other alternative for fixing this use case is to simply not validate the last page at all, so pages can be loaded on demand. But that validation was added for good reasons, and this would also result in a bad experience if you didn't read the document from the front. Or assume in certain conditions that the top level /Pages contains only pages (see https://github.com/mozilla/pdf.js/compare/master...richard-smith-preservica:pdf.js:rcs/assume-all-pages-in-top-level-when-likely?expand=1), but that allows for particular edge case 'bad' PDFs to render incorrectly

eslint

Review - Fix new promise side of fetch; local cache variable; validation on when to prefetch
  • Loading branch information
Richard Smith (smir) committed Aug 21, 2024
1 parent b47c7ec commit 56358b0
Showing 1 changed file with 14 additions and 3 deletions.
17 changes: 14 additions & 3 deletions src/core/catalog.js
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,7 @@ class Catalog {
this.globalImageCache = new GlobalImageCache();
this.pageKidsCountCache = new RefSetCache();
this.pageIndexCache = new RefSetCache();
this.pageDictCache = new RefSetCache();
this.nonBlendModesSet = new RefSet();
this.systemFontCache = new Map();
}
Expand Down Expand Up @@ -1161,6 +1162,7 @@ class Catalog {
this.globalImageCache.clear(/* onlyData = */ manuallyTriggered);
this.pageKidsCountCache.clear();
this.pageIndexCache.clear();
this.pageDictCache.clear();
this.nonBlendModesSet.clear();

const translatedFonts = await Promise.all(this.fontCache);
Expand All @@ -1184,7 +1186,8 @@ class Catalog {
}
const xref = this.xref,
pageKidsCountCache = this.pageKidsCountCache,
pageIndexCache = this.pageIndexCache;
pageIndexCache = this.pageIndexCache,
pageDictCache = this.pageDictCache;
let currentPageIndex = 0;

while (nodesToVisit.length) {
Expand All @@ -1203,7 +1206,8 @@ class Catalog {
}
visitedNodes.put(currentNode);

const obj = await xref.fetchAsync(currentNode);
const obj = await (pageDictCache.get(currentNode) ||
xref.fetchAsync(currentNode));
if (obj instanceof Dict) {
let type = obj.getRaw("Type");
if (type instanceof Ref) {
Expand Down Expand Up @@ -1285,7 +1289,14 @@ class Catalog {
// node further down in the tree (see issue5644.pdf, issue8088.pdf),
// and to ensure that we actually find the correct `Page` dict.
for (let last = kids.length - 1; last >= 0; last--) {
nodesToVisit.push(kids[last]);
const lastKid = kids[last];
nodesToVisit.push(lastKid);

// Launch all requests in parallel so we don't wait for each one in turn
// when looking for a page near the end, if all the pages are top level
if (currentNode === this.toplevelPagesDict && lastKid instanceof Ref && !pageDictCache.has(lastKid)) {

Check failure on line 1297 in src/core/catalog.js

View workflow job for this annotation

GitHub Actions / Lint (lts/*)

Replace `currentNode·===·this.toplevelPagesDict·&&·lastKid·instanceof·Ref·&&·!pageDictCache.has(lastKid)` with `⏎··········currentNode·===·this.toplevelPagesDict·&&⏎··········lastKid·instanceof·Ref·&&⏎··········!pageDictCache.has(lastKid)⏎········`
pageDictCache.put(lastKid, xref.fetchAsync(lastKid));
}
}
}

Expand Down

0 comments on commit 56358b0

Please sign in to comment.