Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: splitstore: deadlock during compaction (setup?) #9846

Closed
magik6k opened this issue Dec 12, 2022 · 1 comment · Fixed by #10840
Closed

bug: splitstore: deadlock during compaction (setup?) #9846

magik6k opened this issue Dec 12, 2022 · 1 comment · Fixed by #10840
Labels

Comments

@magik6k
Copy link
Contributor

magik6k commented Dec 12, 2022

Goroutines: https://gist.github.com/magik6k/4e13c2d53a5ba491d35b16d7a24c1b4a

Got what looks a deadlock in compaction a few momets after startup:

Splitstore compact goroutine, holds Splitstore.mx.Lock(), stuck on Mpool.lk.Lock

goroutine 24285 [semacquire, 27 minutes]:
sync.runtime_SemacquireMutex(0xc041b135c0?, 0xe0?, 0x4611e7c?)
	/usr/lib/go/src/runtime/sema.go:77 +0x25
sync.(*Mutex).lockSlow(0xc0239ca680)
	/usr/lib/go/src/sync/mutex.go:171 +0x165
sync.(*Mutex).Lock(...)
	/usr/lib/go/src/sync/mutex.go:90
github.com/filecoin-project/lotus/chain/messagepool.(*MessagePool).ForEachPendingMessage(0x18?, 0xc05ec065e8)
	/home/magik6k/github.com/filecoin-project/go-lotus/chain/messagepool/messagepool.go:462 +0x71
github.com/filecoin-project/lotus/blockstore/splitstore.(*SplitStore).applyProtectors(0xc002ecc280)
	/home/magik6k/github.com/filecoin-project/go-lotus/blockstore/splitstore/splitstore_compact.go:460 +0x185
github.com/filecoin-project/lotus/blockstore/splitstore.(*SplitStore).doCompact(0xc002ecc280, 0xc0532271c0)
	/home/magik6k/github.com/filecoin-project/go-lotus/blockstore/splitstore/splitstore_compact.go:544 +0x59d
github.com/filecoin-project/lotus/blockstore/splitstore.(*SplitStore).compact(0xc002ecc280, 0x0?)
	/home/magik6k/github.com/filecoin-project/go-lotus/blockstore/splitstore/splitstore_compact.go:497 +0x198
github.com/filecoin-project/lotus/blockstore/splitstore.(*SplitStore).HeadChange.func1()
	/home/magik6k/github.com/filecoin-project/go-lotus/blockstore/splitstore/splitstore_compact.go:131 +0x133
created by github.com/filecoin-project/lotus/blockstore/splitstore.(*SplitStore).HeadChange
	/home/magik6k/github.com/filecoin-project/go-lotus/blockstore/splitstore/splitstore_compact.go:124 +0x305

Tons of goroutines stuck waiting on Splitstore.mx.Lock():

goroutine 83775 [semacquire]:
sync.runtime_SemacquireMutex(0x6c89fd?, 0x20?, 0x4317300?)
	/usr/lib/go/src/runtime/sema.go:77 +0x25
sync.(*Mutex).lockSlow(0xc002ecc2b0)
	/usr/lib/go/src/sync/mutex.go:171 +0x165
sync.(*Mutex).Lock(...)
	/usr/lib/go/src/sync/mutex.go:90
github.com/filecoin-project/lotus/blockstore/splitstore.(*SplitStore).isWarm(0x5bb0120?)
	/home/magik6k/github.com/filecoin-project/go-lotus/blockstore/splitstore/splitstore.go:635 +0x57
github.com/filecoin-project/lotus/blockstore/splitstore.(*SplitStore).Get(0xc002ecc280, {0x5bc2498, 0xc01ebeea40}, {{0xc032352f90?, 0x5ba5740?}})
	/home/magik6k/github.com/filecoin-project/go-lotus/blockstore/splitstore/splitstore.go:369 +0x43c
github.com/ipfs/go-ipld-cbor.(*BasicIpldStore).Get(0xc0462148a0, {0x5bc2498?, 0xc01ebeea40?}, {{0xc032352f90?, 0xc8?}}, {0x419d780?, 0xc034de08f0?})
	/home/magik6k/.opt/go/pkg/mod/github.com/ipfs/[email protected]/store.go:63 +0xf6
github.com/filecoin-project/go-state-types/builtin/v9/miner.(*State).GetInfo(0xc000338690, {0x7ffb9434b558, 0xc0478be180})
	/home/magik6k/.opt/go/pkg/mod/github.com/filecoin-project/[email protected]/builtin/v9/miner/miner_state.go:181 +0x83
....

Among those, a goroutine loading persisted local messages, which holds Mpool.lk.Lock from messagepool.New, stuck on Splitstore.mx.Lock():

goroutine 1138 [semacquire, 27 minutes]:
sync.runtime_SemacquireMutex(0x6c89fd?, 0x20?, 0x4317300?)
        /usr/lib/go/src/runtime/sema.go:77 +0x25
sync.(*Mutex).lockSlow(0xc002ecc2b0)
        /usr/lib/go/src/sync/mutex.go:171 +0x165
sync.(*Mutex).Lock(...)
        /usr/lib/go/src/sync/mutex.go:90
github.com/filecoin-project/lotus/blockstore/splitstore.(*SplitStore).isWarm(0x5bb0120?)
        /home/magik6k/github.com/filecoin-project/go-lotus/blockstore/splitstore/splitstore.go:635 +0x57
github.com/filecoin-project/lotus/blockstore/splitstore.(*SplitStore).Get(0xc002ecc280, {0x5bc24d0, 0xc000056050}, {{0xc0035868a0?, 0xb5a3d7?}})
        /home/magik6k/github.com/filecoin-project/go-lotus/blockstore/splitstore/splitstore.go:369 +0x43c
github.com/ipfs/go-ipld-cbor.(*BasicIpldStore).Get(0xc000c60900, {0x5bc24d0?, 0xc000056050?}, {{0xc0035868a0?, 0x40?}}, {0x4528ae0?, 0xc05cfa1b00?})
        /home/magik6k/.opt/go/pkg/mod/github.com/ipfs/[email protected]/store.go:63 +0xf6
github.com/filecoin-project/go-hamt-ipld/v3.loadNode({0x5bc24d0, 0xc000056050}, {0x7ffb10325a88, 0xc057997300}, {{0xc0035868a0?, 0x4528ae0?}}, 0x0, 0x5, 0x584c390)
        /home/magik6k/.opt/go/pkg/mod/github.com/filecoin-project/go-hamt-ipld/[email protected]/hamt.go:363 +0xc4
github.com/filecoin-project/go-hamt-ipld/v3.(*Pointer).loadChild(0xc05dbb1900, {0x5bc24d0?, 0xc000056050?}, {0x7ffb10325a88?, 0xc057997300?}, 0xc0035861b0?, 0xc05dbb18c0?)
        /home/magik6k/.opt/go/pkg/mod/github.com/filecoin-project/go-hamt-ipld/[email protected]/hamt.go:302 +0x65
github.com/filecoin-project/go-hamt-ipld/v3.(*Node).getValue(0xc05dbb18c0, {0x5bc24d0, 0xc000056050}, 0x7ffb10325a88?, {0xc03b3a4300, 0x31}, 0xc0487ad7e0)
        /home/magik6k/.opt/go/pkg/mod/github.com/filecoin-project/go-hamt-ipld/[email protected]/hamt.go:275 +0x237
github.com/filecoin-project/go-hamt-ipld/v3.(*Node).getValue(0xc05dbb1080, {0x5bc24d0, 0xc000056050}, 0x6c79ad?, {0xc03b3a4300, 0x31}, 0xc0487ad7e0)
        /home/magik6k/.opt/go/pkg/mod/github.com/filecoin-project/go-hamt-ipld/[email protected]/hamt.go:280 +0x287
github.com/filecoin-project/go-hamt-ipld/v3.(*Node).Find(0xc05dbb1080, {0x5bc24d0, 0xc000056050}, {0xc03b3a4300, 0x31}, {0x7ffb10325ab0, 0xc028781ca8})
        /home/magik6k/.opt/go/pkg/mod/github.com/filecoin-project/go-hamt-ipld/[email protected]/hamt.go:193 +0x12d
github.com/filecoin-project/go-state-types/builtin/v9/util/adt.(*Map).Get(0xc000c615f0, {0x5bae9e0, 0xc057997380}, {0x5bac300?, 0xc028781ca8})
        /home/magik6k/.opt/go/pkg/mod/github.com/filecoin-project/[email protected]/builtin/v9/util/adt/map.go:108 +0xd9
github.com/filecoin-project/go-state-types/builtin/v9/init.(*State).ResolveAddress(0xc057997300?, {0x7ffb10325a58?, 0xc057997300?}, {{0xc03bdfb600?, 0x6cade7?}})
        /home/magik6k/.opt/go/pkg/mod/github.com/filecoin-project/[email protected]/builtin/v9/init/init_actor_state.go:56 +0x110
github.com/filecoin-project/lotus/chain/actors/builtin/init.(*state9).ResolveAddress(0xc05dbb1040, {{0xc03bdfb600?, 0xc02893ed80?}})
        /home/magik6k/github.com/filecoin-project/go-lotus/chain/actors/builtin/init/v9.go:53 +0x65
github.com/filecoin-project/lotus/chain/state.(*StateTree).lookupIDinternal(0xc050fa6c30, {{0xc03bdfb600?, 0xc0487ada58?}})
        /home/magik6k/github.com/filecoin-project/go-lotus/chain/state/statetree.go:332 +0x182
github.com/filecoin-project/lotus/chain/state.(*StateTree).LookupID(0xc050fa6c30, {{0xc03bdfb600?, 0x2?}})
        /home/magik6k/github.com/filecoin-project/go-lotus/chain/state/statetree.go:352 +0x128
github.com/filecoin-project/lotus/chain/state.(*StateTree).GetActor(0xc050fa6c30, {{0xc03bdfb600?, 0xc000056050?}})
        /home/magik6k/github.com/filecoin-project/go-lotus/chain/state/statetree.go:369 +0x7f
github.com/filecoin-project/lotus/chain/messagepool.(*mpoolProvider).GetActorAfter(0xc00223da20, {{0xc03bdfb600?, 0xc05def9220?}}, 0xc000c606f0?)
        /home/magik6k/github.com/filecoin-project/go-lotus/chain/messagepool/provider.go:102 +0x2df
github.com/filecoin-project/lotus/chain/messagepool.(*MessagePool).getStateNonce(0xc0239ca680, {0x5bc2498?, 0xc01ebefd80?}, {{0xc03bdfb600?, 0x0?}}, 0xc002ef0040)
        /home/magik6k/github.com/filecoin-project/go-lotus/chain/messagepool/messagepool.go:1189 +0x1c6
github.com/filecoin-project/lotus/chain/messagepool.(*MessagePool).addLoaded(0xc0239ca680, {0x5bc2498, 0xc01ebefd80}, 0xc0209cb5f0)
        /home/magik6k/github.com/filecoin-project/go-lotus/chain/messagepool/messagepool.go:1038 +0x73
github.com/filecoin-project/lotus/chain/messagepool.(*MessagePool).loadLocal(0xc0239ca680, {0x5bc2498, 0xc01ebefd80})
        /home/magik6k/github.com/filecoin-project/go-lotus/chain/messagepool/messagepool.go:1667 +0x28e
github.com/filecoin-project/lotus/chain/messagepool.New.func2()
        /home/magik6k/github.com/filecoin-project/go-lotus/chain/messagepool/messagepool.go:444 +0x6b
created by github.com/filecoin-project/lotus/chain/messagepool.New
        /home/magik6k/github.com/filecoin-project/go-lotus/chain/messagepool/messagepool.go:442 +0xc25

The fix is probably to abort compaction when mpool is not fully loaded (it loads async because on nodes which send tons of messages that startup can take a few minutes)

@arajasek
Copy link
Contributor

Quickfix applied in #9903 (review), but I do think we want a better fix here (so that entire compact/prune operations don't fail).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants