Optimize graph object buffering and flushing #395

austinkelleher · 2020-12-16T11:45:51Z

Overview

The motivation of this change is to allow our FileSystemGraphObjectStore to buffer more data in memory before performing a flush.

Previously Implementation Details

Whenever we called jobState.findEntity, jobState.iterateEntities, or jobState.iterateRelationships, a flush would be performed. Multiple steps are typically running at the same time in an integration. This means that flushing is happening all of the time.

New Implementation Details

We now have two in-memory graph object lookup tables:

A = Map<Graph Object _key, Graph Object>
B = Map<Graph Object _type, Map<Graph Object _key, boolean>>

When we perform a jobState.findEntity, we first check map A to see if we have this graph object buffered in memory. If we do not, then we perform our traditional lookup on disk. (There is actually more we can do to optimize this. See here: #385 (comment))

When we perform a jobState.iterateEntities or a jobState.iterateRelationships, we first iterate over the data we have buffered in map B, then we perform our traditional disk file iteration method.

ndowmon

Looks good @austinkelleher, thanks for this improvement!

packages/integration-sdk/CHANGELOG.md

packages/integration-sdk-runtime/src/storage/memory.ts

...integration-sdk-runtime/src/storage/FileSystemGraphObjectStore/FileSystemGraphObjectStore.ts

packages/integration-sdk-runtime/src/storage/memory.ts

ctdio

Just a small change requested regarding some commented out code. Otherwise this looks pretty good!

...-runtime/src/storage/FileSystemGraphObjectStore/__tests__/FileSystemGraphObjectStore.test.ts

ndowmon

👍

Only write prettified files to the file system on local collection

Add tests for job state upload calls

Share graph object creation test utils across tests and cleanup

Support for continuous integration data uploads

mknoedel · 2020-12-17T15:18:20Z

...integration-sdk-runtime/src/storage/FileSystemGraphObjectStore/FileSystemGraphObjectStore.ts

    ) {
-      await this.flushRelationshipsToDisk();
+      await this.flushRelationshipsToDisk(onRelationshipsFlushed);
    }
  }

  async getEntity({ _key, _type }: GraphObjectLookupKey): Promise<Entity> {


Thought: what information is typically gotten from previous entities? My first thought would be that it would usually only be foreign keys, which is likely just an id. Is it worth considering when we flush InMemoryGraphObjectStore, we maintain a map similar to the implementation of DuplicateKeyTracker. That way we could likely store much more than 500 entities and relationships for most of the use-cases of getEntity.
Maybe we could get lucky and reduce roundtrips to the disk a good amount of the time.

There are definitely lots of additional caching improvements that we can make!

...integration-sdk-runtime/src/storage/FileSystemGraphObjectStore/FileSystemGraphObjectStore.ts

packages/integration-sdk-runtime/src/storage/memory.test.ts

mknoedel

Looks like a great improvement!

Improve uploader queue to respect queue size instead of waiting for idle

- @jupiterone/[email protected] - @jupiterone/[email protected] - @jupiterone/[email protected] - @jupiterone/[email protected] - @jupiterone/[email protected] - @jupiterone/[email protected] - @jupiterone/[email protected]

austinkelleher added 2 commits December 13, 2020 18:10

Optimize graph object buffering and flushing

b9ef640

Improve tests for FileSystemGraphObjectStore

319f1d2

austinkelleher requested review from ctdio, aiwilliams, ndowmon and mknoedel December 16, 2020 12:52

ndowmon reviewed Dec 16, 2020

View reviewed changes

ctdio suggested changes Dec 16, 2020

View reviewed changes

...-runtime/src/storage/FileSystemGraphObjectStore/__tests__/FileSystemGraphObjectStore.test.ts Outdated Show resolved Hide resolved

austinkelleher added 6 commits December 16, 2020 10:54

Throw if the two graph object maps get out of sync

27949a9

Initial continuous upload support

6db9779

More tests around continuous uploads and various improvements

8d22caf

Additional test for FileSystemGraphObjectStore callbacks

8196339

Mark step as a failure if uploading fails in a step

1271491

Export relevant functions and types from uploader

b05beb8

ndowmon previously approved these changes Dec 16, 2020

View reviewed changes

austinkelleher and others added 8 commits December 16, 2020 11:37

Remove old comment, update test descriptions.

5c0ad98

Share graph object creation test utils across tests and cleanup

b1c0804

Fix typo in test function

43e57a9

Add tests for job state upload calls

2b1052c

Fix test function names

47843b7

Test assertion improvements

ca68b8a

Only write prettified files to the file system on local collection

06b3394

Change prettyFile to prettifyFiles

55bb7b5

ctdio previously approved these changes Dec 16, 2020

View reviewed changes

austinkelleher added 4 commits December 16, 2020 12:50

Merge pull request #399 from JupiterOne/1849-unpretty-local-files

2059ac5

Only write prettified files to the file system on local collection

Merge pull request #398 from JupiterOne/1765-continuous-upload-tests

a627688

Add tests for job state upload calls

Merge pull request #397 from JupiterOne/1848-test-cleanup

6a2c2bf

Share graph object creation test utils across tests and cleanup

Merge pull request #396 from JupiterOne/1765-continuous-uploads

b2b44c3

Support for continuous integration data uploads

austinkelleher dismissed stale reviews from ctdio and ndowmon via b2b44c3 December 16, 2020 18:28

ctdio previously approved these changes Dec 16, 2020

View reviewed changes

ndowmon previously approved these changes Dec 16, 2020

View reviewed changes

mknoedel reviewed Dec 17, 2020

View reviewed changes

...integration-sdk-runtime/src/storage/FileSystemGraphObjectStore/FileSystemGraphObjectStore.ts Show resolved Hide resolved

mknoedel reviewed Dec 17, 2020

View reviewed changes

packages/integration-sdk-runtime/src/storage/memory.test.ts Outdated Show resolved Hide resolved

mknoedel previously approved these changes Dec 17, 2020

View reviewed changes

austinkelleher and others added 3 commits December 18, 2020 07:20

Improve uploader queue to respect queue size instead of waiting for idle

aba2ed6

Change queue size check to >= for safety

cfbada4

Merge pull request #400 from JupiterOne/1896-queue-size

e3d379a

Improve uploader queue to respect queue size instead of waiting for idle

austinkelleher dismissed stale reviews from mknoedel, ndowmon, and ctdio via e3d379a December 18, 2020 16:43

austinkelleher added 2 commits December 18, 2020 12:45

Used shared graph object utilities and update CHANGELOG.md

cd586e1

Add version to CHANGELOG.md

d370104

ndowmon previously approved these changes Dec 18, 2020

View reviewed changes

mknoedel previously approved these changes Dec 18, 2020

View reviewed changes

Publish

3cb5160

- @jupiterone/[email protected] - @jupiterone/[email protected] - @jupiterone/[email protected] - @jupiterone/[email protected] - @jupiterone/[email protected] - @jupiterone/[email protected] - @jupiterone/[email protected]

austinkelleher dismissed stale reviews from mknoedel and ndowmon via 3cb5160 December 18, 2020 17:53

ndowmon approved these changes Dec 18, 2020

View reviewed changes

austinkelleher merged commit 0a90361 into master Dec 18, 2020

austinkelleher deleted the 1786-optimize-flushing branch December 18, 2020 17:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize graph object buffering and flushing #395

Optimize graph object buffering and flushing #395

austinkelleher commented Dec 16, 2020 •

edited

Loading

ndowmon left a comment

ctdio left a comment

ndowmon left a comment

mknoedel Dec 17, 2020 •

edited

Loading

austinkelleher Dec 18, 2020

mknoedel left a comment

Optimize graph object buffering and flushing #395

Optimize graph object buffering and flushing #395

Conversation

austinkelleher commented Dec 16, 2020 • edited Loading

Overview

Previously Implementation Details

New Implementation Details

ndowmon left a comment

Choose a reason for hiding this comment

ctdio left a comment

Choose a reason for hiding this comment

ndowmon left a comment

Choose a reason for hiding this comment

mknoedel Dec 17, 2020 • edited Loading

Choose a reason for hiding this comment

austinkelleher Dec 18, 2020

Choose a reason for hiding this comment

mknoedel left a comment

Choose a reason for hiding this comment

austinkelleher commented Dec 16, 2020 •

edited

Loading

mknoedel Dec 17, 2020 •

edited

Loading