[Sequencer] performance degraded by sender recovery #1605

hexoscott · 2025-01-07T09:29:26Z

A while back we needed to perform sender recovery whilst pulling transactions from the pool to ensure this worked correctly and didn't error before adding the TX to a block. This looks to have introduced some performance hit to the sequencer which we need to investigate.

Related PR introducing these changes here #1480

Things to explore:

can we avoid sender recovery using the method in the PR? Maybe get some extra data from the pool for example.
can we make sender recovery more performant using parallelisation?

doutv · 2025-01-07T09:58:58Z

Geth cached sender address, in order to avoid calling ECRecover on the same sender address twice
ethereum/go-ethereum#30208
https://github.com/ethereum/go-ethereum/blob/master/core/types/transaction_signing.go#L140

hexoscott · 2025-01-07T10:04:47Z

Erigon has this same concept of caching, but it exists with the transaction whilst it is in memory. Because the pool is a separate entity to the sequencer in erigon it passes the raw RLP over the interface, so the sequencer needs to decode this once more and won't have the sender cached.

doutv · 2025-01-07T10:06:30Z

Geth call ecdsa_recover in parallel.

https://github.com/ethereum/go-ethereum/blob/master/crypto/secp256k1/secp256.go#L48

doutv · 2025-01-08T09:30:35Z

Cache sender address works perfect under our scenario.

We are focusing on ERC4337 transaction performance.
In ERC4337:

User create and sign UserOperation
A backend called bundler will wrap multiple UserOperation into one transaction, and signs the transaction with bundler's private key. Bundler only has 1 private key and the sender address is fixed.
Bundler send the transaction to cdk-erigon RPC

doutv · 2025-01-10T03:22:46Z

Summary Report: Performance Degradation when lots of Overflow Transactions

Co-authored with LLM

Overview

When producing a block, sequencer yield 1000 transactions from txpool, but only 2 are mined, the remaining 998 transactions are retained and need to be re-processed in subsequent blocks.
Repeatedly process the same transactions again and again is time-consuming. Compute transaction hash and recover sender address takes 50% time according to pprof

Test method

Polycli loadtest 5000 ERC4337 transactions, each transaction contains 10 dummy UserOperations, each UserOperation verify 2 ECDSA signatures (P-256 + Secp256r1)
Max Transaction in a block: 2, due to zkCounter limit, trying to include the 3rd transaction to the block will trigger overflow

Key Observations

Transaction Yielding: The system attempts to yield 1000 transactions from the transaction pool for processing in each batch. This is controlled by the yieldSize parameter, I use the default value 1000
Sender Recovery: For each transaction, the sender's address is recovered using the signer.Sender method. This step is necessary for transaction validation but can be computationally expensive, especially when repeated for transactions that are not ultimately included in a block.
Transaction Inclusion: Due to constraints such as gas limits and counter overflows, only 2 transactions are included in a block. The third transaction and beyond are not processed due to overflow conditions.
Transaction Pool Management: The RemoveMinedTransactions function only removes the successfully mined transactions from the pool. The remaining 998 transactions are retained and need to be re-processed in subsequent blocks.
Reprocessing Overhead: The need to repeatedly process the same transactions in subsequent blocks can lead to increased computational overhead and reduced throughput.

Potential Solutions

Estimate and Check zkCounter overflow

This can fix 2 hot spots, both sender recovery and compute transaction hash

YieldBest yield top transactions in txpool, it estimate gas and stop adding more transactions when gas is overflow.
Similarily, we can estimate and check zkCounter overflow here. If zkCounter is already out of limit, then don't yield more transactions.

cdk-erigon/zk/txpool/pool_zk.go

Lines 185 to 243 in ba9c7a4

    
           // if we wouldn't have enough gas for a standard transaction then quit out early 
        
           if availableGas < fixedgas.TxGas { 
        
           	break 
        
           } 
        
           mt := best.ms[i] 
        
           p.Trace("Processing transaction", "txID", mt.Tx.IDHash) 
        
           if toSkip.Contains(mt.Tx.IDHash) { 
        
           	p.Trace("Skipping transaction, already in toSkip", "txID", mt.Tx.IDHash) 
        
           	continue 
        
           } 
        
           if !isLondon && mt.Tx.Type == 0x2 { 
        
           	// remove ldn txs when not in london 
        
           	toRemove = append(toRemove, mt) 
        
           	toSkip.Add(mt.Tx.IDHash) 
        
           	p.Trace("Removing London transaction in non-London environment", "txID", mt.Tx.IDHash) 
        
           	continue 
        
           } 
        
           if mt.Tx.Gas > transactionGasLimit { 
        
           	// Skip transactions with very large gas limit, these shouldn't enter the pool at all 
        
           	log.Debug("found a transaction in the pending pool with too high gas for tx - clear the tx pool") 
        
           	p.Trace("Skipping transaction with too high gas", "txID", mt.Tx.IDHash, "gas", mt.Tx.Gas) 
        
           	continue 
        
           } 
        
           rlpTx, sender, isLocal, err := p.getRlpLocked(tx, mt.Tx.IDHash[:]) 
        
           if err != nil { 
        
           	p.Trace("Error getting RLP of transaction", "txID", mt.Tx.IDHash, "error", err) 
        
           	return false, count, err 
        
           } 
        
           if len(rlpTx) == 0 { 
        
           	toRemove = append(toRemove, mt) 
        
           	p.Trace("Removing transaction with empty RLP", "txID", mt.Tx.IDHash) 
        
           	continue 
        
           } 
        
           // Skip transactions that require more blob gas than is available 
        
           blobCount := uint64(len(mt.Tx.BlobHashes)) 
        
           if blobCount*fixedgas.BlobGasPerBlob > availableBlobGas { 
        
           	p.Trace("Skipping transaction due to insufficient blob gas", "txID", mt.Tx.IDHash, "requiredBlobGas", blobCount*fixedgas.BlobGasPerBlob, "availableBlobGas", availableBlobGas) 
        
           	continue 
        
           } 
        
           availableBlobGas -= blobCount * fixedgas.BlobGasPerBlob 
        
           // make sure we have enough gas in the caller to add this transaction. 
        
           // not an exact science using intrinsic gas but as close as we could hope for at 
        
           // this stage 
        
           intrinsicGas, _ := CalcIntrinsicGas(uint64(mt.Tx.DataLen), uint64(mt.Tx.DataNonZeroLen), nil, mt.Tx.Creation, true, true, isShanghai) 
        
           if intrinsicGas > availableGas { 
        
           	// we might find another TX with a low enough intrinsic gas to include so carry on 
        
           	p.Trace("Skipping transaction due to insufficient gas", "txID", mt.Tx.IDHash, "intrinsicGas", intrinsicGas, "availableGas", availableGas) 
        
           	continue 
        
           } 
        
           if intrinsicGas <= availableGas { // check for potential underflow 
        
           	availableGas -= intrinsicGas 
        
           }

Dynamic Yield Size: Adjust the yieldSize dynamically based on the observed block capacity and overflow conditions to reduce unnecessary processing.

This can fix 2 hot spots, both sender recovery and compute transaction hash

Efficient Sender Caching: Improve the caching mechanism for sender addresses to reduce the computational cost of repeated sender recovery.

Only accelerate sender recovery

I tried a simple caching method and it works well, improve sequencer TPS 50%.
This is a naive cache implementation, only for PoC.
Code: okx@f748409

hexoscott · 2025-01-10T08:51:06Z

Thanks @doutv for the detailed feedback there. A couple of things from my side:

estimating zk counters isn't possible as we need to execute the TX to know these, a very simple TX in terms of bytes length could overflow counters for example, where a seemingly complex TX could use very little. There is an issue open for attempting this ahead of time but it won't be trivial and will likely be handled at the point of adding the TX into the pool.
Excellent point on the sender recovery being wasteful. I think we can move this up the stack so we only perform this action just before attempting to add the TX into a block. So, in your example of yielding 1000 transactions and only mining 2 then we would at most perform 3 sender recoveries, 2 for the mined txs and a 3rd just before attempting to add tx 3 and finding the overflow.

doutv · 2025-01-13T03:37:48Z

cdk-erigon/zk/stages/stage_sequence_execute.go

Line 398 in ca5ef6b

batchState.blockState.transactionHashesToSlots[tx.Hash()] = newIds[idx]

Before adding TX into a block, compute hash for 1000 transactions is also time-consuming

doutv · 2025-01-13T04:13:40Z

Another simple method: change yieldsize config from 1000 to 10.

For huge TX like ERC4337 in my setting. A block can only include 2 TX. Yield 10 TX at a time will reduce the overhead a lot.
For small TX like send ETH. A block can include 500+ TX. Yield 10 TX at a time introduce some overhead, due to execute OuterLoopTransactions many times.

doutv · 2025-01-13T04:20:55Z

Should I create a PR to fix it? Or you guys want to work on this?

IvanBelyakoff · 2025-01-13T11:01:15Z

Should I create a PR to fix it? Or you guys want to work on this?

I would like to work on that. Thanx a lot for all the details, appreciate it!

revitteth self-assigned this Jan 8, 2025

revitteth assigned IvanBelyakoff and unassigned revitteth Jan 8, 2025

IvanBelyakoff mentioned this issue Jan 15, 2025

fix: recover sender only for those txs that are included #1636

Merged

IvanBelyakoff closed this as completed in #1636 Jan 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Sequencer] performance degraded by sender recovery #1605

[Sequencer] performance degraded by sender recovery #1605

hexoscott commented Jan 7, 2025

doutv commented Jan 7, 2025 •

edited

Loading

hexoscott commented Jan 7, 2025

doutv commented Jan 7, 2025

doutv commented Jan 8, 2025 •

edited

Loading

doutv commented Jan 10, 2025 •

edited

Loading

hexoscott commented Jan 10, 2025

doutv commented Jan 13, 2025

doutv commented Jan 13, 2025

doutv commented Jan 13, 2025

IvanBelyakoff commented Jan 13, 2025

[Sequencer] performance degraded by sender recovery #1605

[Sequencer] performance degraded by sender recovery #1605

Comments

hexoscott commented Jan 7, 2025

doutv commented Jan 7, 2025 • edited Loading

hexoscott commented Jan 7, 2025

doutv commented Jan 7, 2025

doutv commented Jan 8, 2025 • edited Loading

doutv commented Jan 10, 2025 • edited Loading

Summary Report: Performance Degradation when lots of Overflow Transactions

Overview

Test method

Key Observations

Potential Solutions

Estimate and Check zkCounter overflow

Dynamic Yield Size: Adjust the yieldSize dynamically based on the observed block capacity and overflow conditions to reduce unnecessary processing.

Efficient Sender Caching: Improve the caching mechanism for sender addresses to reduce the computational cost of repeated sender recovery.

hexoscott commented Jan 10, 2025

doutv commented Jan 13, 2025

doutv commented Jan 13, 2025

doutv commented Jan 13, 2025

IvanBelyakoff commented Jan 13, 2025

doutv commented Jan 7, 2025 •

edited

Loading

doutv commented Jan 8, 2025 •

edited

Loading

doutv commented Jan 10, 2025 •

edited

Loading