core: move TxPool reorg and events to background goroutine #19705

fjl · 2019-06-12T13:41:57Z

This change moves internal queue re-shuffling work in TxPool to a background
goroutine, TxPool.runReorg. Requests to execute runReorg are accumulated
by the new scheduleReorgLoop. The new loop also accumulates transaction events.

The motivation for this change is making sends to txFeed synchronous.

Fixes #19192

fjl · 2019-06-12T13:43:21Z

This isn't quite done yet. I still need to figure out how to remove the go txFeed.Send(...) in TxPool.add.

fjl · 2019-06-12T15:34:27Z

The 'aggregation' of promoteExecutables calls probably won't work in practice with this change alone because runReorg and add compete for the central lock.

Consider AddRemote: we take the lock, validate the transaction, move the TX into pending or queue and drop the lock. Then we kick off a background reorg, which takes the lock again, moves TXs around, drops the lock and finally sends events. Since the next call to AddRemote will have to wait for the lock held by reorg before adding its TX, concurrency doesn't improve much. Things are slightly better than before this change when it comes to events though. If a downstream consumer of txFeed is blocked for a while, reorg calls will aggregate instead of launching many goroutines to deliver events.

To really improve high load behavior, we'd need a way to quickly verify the TX without the big lock, submit the request to add it and move on. The reorg stuff could then take batches of transactions periodically and integrate them all at once, keeping the central lock free for readers to take most of the time.

fjl · 2019-06-12T16:23:14Z

An idea: if we split up the lock so there is one for the queue and one for pending we could just always insert TXs into the queue and then always promote them in the background.

core/tx_pool.go

holiman · 2019-06-13T11:32:58Z

core/tx_pool.go


-	return pool.addTxsLocked(txs, local)
+	done := pool.requestPromoteExecutables(dirtyAddrs)


The previous code only caled promoteExecutables if len(dirty) > 0 (in addTxsLocked). Was there a reason to not carry that over?

The request is definitely needed to trigger sending of events even if
there isn't anything to promote. Furthermore, promoteExecutables is pretty
smart about avoiding work.

fjl · 2019-06-13T13:11:56Z

@holiman PTAL

core/tx_pool.go

core/tx_pool_test.go

holiman · 2019-06-14T10:30:03Z

This PR (specifically) instance=Geth/v1.9.0-unstable-87409422-20190614/linux-amd64/go1.12.6 is now running on

bootnode-aws-us-east-1-001
bootnode-aws-ap-southeast-1-001
bootnode-azure-australiaeast-001
bootnode-azure-westus-001

holiman · 2019-06-16T20:04:21Z

Here are the charts after a few days running. The machines are LES servers, and have around 490 peers each, the upper chart shows the one running this PR, the lower ones some old version which is fairly recent (but does not have this PR (nor some other changes in the LES area)

The blip on friday evening is when they were restarted with this PR.

The non-updated machines have a lifetime of around 24-36 hours before they crash, with stack traces that point to extreme goroutine overload. So yeah, this PR looks good to me from that perspective!

holiman · 2019-06-17T07:15:33Z

Seeing quite a lot of these in the logs;

ERROR[06-17|06:25:47.909] Demoting invalidated transaction         hash=d51870…cd6d3d
ERROR[06-17|06:25:47.946] Demoting invalidated transaction         hash=560c07…7c2506
ERROR[06-17|06:25:47.965] Demoting invalidated transaction         hash=d74ba3…9629c2
ERROR[06-17|06:25:47.991] Demoting invalidated transaction         hash=75b0b8…950046

If it is truly ERROR, we need to look into it. If it's expected, it shouldn't be ERROR level

holiman · 2019-06-17T08:39:47Z

Error messages comes from demoteUnexecutables, which is called from reset.

	// Inject any transactions discarded due to reorgs
	log.Debug("Reinjecting stale transactions", "count", len(reinject))
	senderCacher.recover(pool.signer, reinject)
	pool.addTxsLocked(reinject, false)

	// validate the pool of pending transactions, this will remove
	// any transactions that have been included in the block or
	// have been invalidated because of another transaction (e.g.
	// higher gas price)
	pool.demoteUnexecutables()

The reset tries to take transactions that were in a block that became reorged and did not make it into the new block, and shove them back into the pending. However, the addTxsLocked in this PR is async, and they don't immediately wind up in pending.

Afterwards, it tries to demote unexecutables, and finds that the subsequent transactions are still there, and it becomes surprised since the reinjceted ones are now not there. And the effect is that the subsequent ones get shoved out from pending and into queued.

holiman · 2019-06-17T12:52:15Z

This PR against master. The four machines running this PR are all around 2.5K goroutines, glued to the bottom of the chart. The ones running master are going haywire.

This change moves internal queue re-shuffling work in TxPool to a background goroutine, TxPool.runReorg. Requests to execute runReorg are accumulated by the new scheduleReorgLoop. The new loop also accumulates transaction events. The motivation for this change is making sends to txFeed synchronous instead of sending them in one-off goroutines launched by 'add' and 'promoteExecutables'. If a downstream consumer of txFeed is blocked for a while, reorg requests and events will queue up.

This change removes tracking of the homestead block number from TxPool. The homestead field was used to enforce minimum gas of 53000 for contract creations after the homestead fork, but not before it. Since nobody would want configure a non-homestead chain nowadays and contract creations usually take more than 53000 gas, the extra correctness is redundant and can be removed.

This is useless now because there is no separate code path for individual transactions anymore.

holiman · 2019-06-20T08:35:34Z

Future bootnodes now updated to run this PR 0dbe8bd against master

…19705)

fjl force-pushed the txpool-event-sync branch from 8f9b7da to 31d523b Compare June 13, 2019 10:45

fjl marked this pull request as ready for review June 13, 2019 10:45

fjl requested review from holiman, karalabe and rjl493456442 as code owners June 13, 2019 10:45

fjl changed the title ~~core: move transaction pool reorgs to background goroutine~~ core: move TxPool reorg and events to background goroutine Jun 13, 2019

holiman reviewed Jun 13, 2019

View reviewed changes

fjl force-pushed the txpool-event-sync branch 2 times, most recently from 72e57d4 to 096e413 Compare June 13, 2019 12:55

fjl mentioned this pull request Jun 13, 2019

Transactions not being broadcasted #14669

Closed

rjl493456442 reviewed Jun 14, 2019

View reviewed changes

core/tx_pool.go Show resolved Hide resolved

fjl commented Jun 14, 2019

View reviewed changes

core/tx_pool.go Outdated Show resolved Hide resolved

fjl commented Jun 14, 2019

View reviewed changes

core/tx_pool_test.go Outdated Show resolved Hide resolved

fjl force-pushed the txpool-event-sync branch 2 times, most recently from 096e413 to 1f90cc7 Compare June 14, 2019 10:04

fjl force-pushed the txpool-event-sync branch from 740d18c to ee68955 Compare June 17, 2019 09:39

holiman mentioned this pull request Jun 20, 2019

when there are two many pendding transactions,the mined block package transactions become very slow #17984

Closed

fjl added 4 commits June 20, 2019 10:57

core: fixes for review comments

ac0387f

core: remove BenchmarkPoolInsert

73e39ba

This is useless now because there is no separate code path for individual transactions anymore.

fjl and others added 3 commits June 20, 2019 10:57

core: fix pending counter metric

049300a

core: fix pool tests

9df2663

core: dedup txpool announced events, discard stales

0dbe8bd

karalabe force-pushed the txpool-event-sync branch from e4bf834 to 0dbe8bd Compare June 20, 2019 07:58

core: reorg tx promotion/demotion to avoid weird pending gaps

29f12ab

karalabe added this to the 1.9.0 milestone Jun 21, 2019

karalabe merged commit 60c062e into ethereum:master Jun 21, 2019

blukat29 mentioned this pull request Apr 24, 2024

txpool: Async txFeed.Send NewTxsEvent klaytn/klaytn#2137

Merged

9 tasks

gzliudan mentioned this pull request May 10, 2024

upgrade tx pool XinFinOrg/XDPoSChain#528

Merged

19 tasks

gzliudan added a commit to gzliudan/XDPoSChain that referenced this pull request May 10, 2024

core: move TxPool reorg and events to background goroutine (ethereum#…

74c7236

…19705)

gzliudan mentioned this pull request May 17, 2024

implement EIP-1559 XinFinOrg/XDPoSChain#527

Merged

19 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core: move TxPool reorg and events to background goroutine #19705

core: move TxPool reorg and events to background goroutine #19705

fjl commented Jun 12, 2019 •

edited

Loading

fjl commented Jun 12, 2019

fjl commented Jun 12, 2019 •

edited

Loading

fjl commented Jun 12, 2019

holiman Jun 13, 2019

fjl Jun 13, 2019

fjl commented Jun 13, 2019

holiman commented Jun 14, 2019

holiman commented Jun 16, 2019 •

edited

Loading

holiman commented Jun 17, 2019

holiman commented Jun 17, 2019

holiman commented Jun 17, 2019

holiman commented Jun 20, 2019


		return pool.addTxsLocked(txs, local)
		done := pool.requestPromoteExecutables(dirtyAddrs)

core: move TxPool reorg and events to background goroutine #19705

core: move TxPool reorg and events to background goroutine #19705

Conversation

fjl commented Jun 12, 2019 • edited Loading

fjl commented Jun 12, 2019

fjl commented Jun 12, 2019 • edited Loading

fjl commented Jun 12, 2019

holiman Jun 13, 2019

Choose a reason for hiding this comment

fjl Jun 13, 2019

Choose a reason for hiding this comment

fjl commented Jun 13, 2019

holiman commented Jun 14, 2019

holiman commented Jun 16, 2019 • edited Loading

holiman commented Jun 17, 2019

holiman commented Jun 17, 2019

holiman commented Jun 17, 2019

holiman commented Jun 20, 2019

fjl commented Jun 12, 2019 •

edited

Loading

fjl commented Jun 12, 2019 •

edited

Loading

holiman commented Jun 16, 2019 •

edited

Loading