-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize graph object buffering and flushing #395
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good @austinkelleher, thanks for this improvement!
...integration-sdk-runtime/src/storage/FileSystemGraphObjectStore/FileSystemGraphObjectStore.ts
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a small change requested regarding some commented out code. Otherwise this looks pretty good!
...-runtime/src/storage/FileSystemGraphObjectStore/__tests__/FileSystemGraphObjectStore.test.ts
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Only write prettified files to the file system on local collection
Add tests for job state upload calls
Share graph object creation test utils across tests and cleanup
Support for continuous integration data uploads
) { | ||
await this.flushRelationshipsToDisk(); | ||
await this.flushRelationshipsToDisk(onRelationshipsFlushed); | ||
} | ||
} | ||
|
||
async getEntity({ _key, _type }: GraphObjectLookupKey): Promise<Entity> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought: what information is typically gotten from previous entities? My first thought would be that it would usually only be foreign keys, which is likely just an id. Is it worth considering when we flush InMemoryGraphObjectStore
, we maintain a map similar to the implementation of DuplicateKeyTracker
. That way we could likely store much more than 500 entities and relationships for most of the use-cases of getEntity.
Maybe we could get lucky and reduce roundtrips to the disk a good amount of the time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are definitely lots of additional caching improvements that we can make!
...integration-sdk-runtime/src/storage/FileSystemGraphObjectStore/FileSystemGraphObjectStore.ts
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a great improvement!
Improve uploader queue to respect queue size instead of waiting for idle
e3d379a
- @jupiterone/[email protected] - @jupiterone/[email protected] - @jupiterone/[email protected] - @jupiterone/[email protected] - @jupiterone/[email protected] - @jupiterone/[email protected] - @jupiterone/[email protected]
3cb5160
Overview
The motivation of this change is to allow our
FileSystemGraphObjectStore
to buffer more data in memory before performing a flush.Previously Implementation Details
Whenever we called
jobState.findEntity
,jobState.iterateEntities
, orjobState.iterateRelationships
, a flush would be performed. Multiple steps are typically running at the same time in an integration. This means that flushing is happening all of the time.New Implementation Details
We now have two in-memory graph object lookup tables:
When we perform a
jobState.findEntity
, we first check mapA
to see if we have this graph object buffered in memory. If we do not, then we perform our traditional lookup on disk. (There is actually more we can do to optimize this. See here: #385 (comment))When we perform a
jobState.iterateEntities
or ajobState.iterateRelationships
, we first iterate over the data we have buffered in mapB
, then we perform our traditional disk file iteration method.