Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add block and actor caches #766

Merged
merged 2 commits into from
Jan 11, 2022
Merged

Add block and actor caches #766

merged 2 commits into from
Jan 11, 2022

Conversation

iand
Copy link
Contributor

@iand iand commented Nov 16, 2021

Adds a block cache that sits just above badger and a state cache in the Lily API lens. The nlock cache holds byte slices keyed by cid, whereas the state cache holds decoded cbor keyed by cid. Both can reduce i/o use by Lily which is a major performance bottleneck. The state cache can get up to 90% hit rate during a walk indicating that there are many repeated reads of actor state. Without the state cache the block cache can also reach 90%+ hit rate, but when both are used together it generally falls to about 50%. i/o utilization falls by about 90% in both cases.

Cache sizes are configured by two new flags when starting lily or by setting the LILY_BLOCKSTORE_CACHE_SIZE and LILY_STATESTORE_CACHE_SIZE environment variables.

This PR also contains an unrelated change: increase the database timeout used for running tests.

@iand iand force-pushed the iand-blockstore-cache branch 3 times, most recently from 1bf1a80 to 4ad1a0a Compare November 24, 2021 17:49
@iand iand requested a review from frrist November 24, 2021 17:50
@iand iand marked this pull request as ready for review November 24, 2021 17:50
@codecov-commenter
Copy link

Codecov Report

Merging #766 (b14648d) into master (97f1c4a) will not change coverage.
The diff coverage is n/a.

@@          Coverage Diff           @@
##           master    #766   +/-   ##
======================================
  Coverage    32.8%   32.8%           
======================================
  Files          40      40           
  Lines        3704    3704           
======================================
  Hits         1215    1215           
  Misses       2353    2353           
  Partials      136     136           

Copy link
Member

@frrist frrist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff!
I'd like to give this a second look after some of the comments are discussed/addressed.

main.go Outdated
Comment on lines 135 to 146
&cli.IntFlag{
Name: "blockstore-cache-size",
EnvVars: []string{"LILY_BLOCKSTORE_CACHE_SIZE"},
Value: 0,
Destination: &commands.CacheFlags.BlockstoreCacheSize,
},
&cli.IntFlag{
Name: "statestore-cache-size",
EnvVars: []string{"LILY_STATESTORE_CACHE_SIZE"},
Value: 0,
Destination: &commands.CacheFlags.StatestoreCacheSize,
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to have these here instead of the daemon command? I'd prefer these to live on the daemon since they don't have an effect on other commands when provided.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also make these Uint flags since negative values are disallowed -- error cases creating the caches could be safely ignored as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, could be on daemon. I can move them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like using uint in general because Go has no overflow detection. They're best reserved for bitfields or masks. But I don't feel that strongly in this case. We use int64 for heights without a problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc we use signed ints for height because that's the underlying type of ChainEpoch and specifying a negative epoch allows someone to specify a time before the genesis block. Although I don't think that's used anywhere, yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to uint on the flags and dependency injection structs

@@ -18,7 +18,7 @@ import (
"github.com/filecoin-project/lily/testutil"
)

const defaultDatabaseWaitTime = time.Minute * 1
const defaultDatabaseWaitTime = time.Minute * 5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

commands/lily.go Outdated
@@ -39,7 +39,7 @@ func NewSentinelNodeRPC(ctx context.Context, addr string, requestHeader http.Hea
}

// Lily Node settings for injection into lotus node.
func LilyNodeAPIOption(out *lily.LilyAPI, fopts ...node.Option) node.Option {
func LilyNodeAPIOption(out *lily.LilyAPI, statestoreCacheSize int, fopts ...node.Option) node.Option {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remnant from a previous implementation? Looks like CacheConfig replaced this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I will remove.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

})
}

if cacheSize <= 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This forces users to pass the blockstore-cache-size flag since its default value is 0 - not the best UX. Can we either pick a nice default for the cache size or omit the check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored to remove this

log.Infof("not creating caching statestore (size=%d)", m.CacheConfig.StatestoreCacheSize)
}

m.actorStore = m.ChainAPI.Chain.ActorStore(context.TODO())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be within the above else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. it creates a non-caching actor store as a fallback from 2 scenarios: there was an error returned from util.NewCachingStateStore or m.CacheConfig.StatestoreCacheSize<=0

(if no error returned from util.NewCachingStateStore then we have already returned from the function)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah whoops, I missed the return in the inner-most else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored to make clearer what the flow is here

}

func NewCachingStateStore(blocks blockstore.Blockstore, cacheSize int) (*CachingStateStore, error) {
cache, err := lru.NewARC(cacheSize)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc this only errors when a negative value is passed, could panic here instead and simplify some of the calling code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's good UX to panic on user input

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, and enforcement of positive integers could be done with Uint flags, but I'm alright with this as is.

}

func NewCachingBlockstore(blocks blockstore.Blockstore, cacheSize int) (*CachingBlockstore, error) {
cache, err := lru.NewARC(cacheSize)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto comment wrt the error check here.

@iand iand force-pushed the iand-blockstore-cache branch from b14648d to cdc84e0 Compare December 1, 2021 12:10
@iand iand merged commit df5bcfa into master Jan 11, 2022
@iand iand deleted the iand-blockstore-cache branch January 11, 2022 13:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants