Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Sequencer] performance degraded by sender recovery #1605

Closed
hexoscott opened this issue Jan 7, 2025 · 10 comments · Fixed by #1636
Closed

[Sequencer] performance degraded by sender recovery #1605

hexoscott opened this issue Jan 7, 2025 · 10 comments · Fixed by #1636
Assignees

Comments

@hexoscott
Copy link
Collaborator

A while back we needed to perform sender recovery whilst pulling transactions from the pool to ensure this worked correctly and didn't error before adding the TX to a block. This looks to have introduced some performance hit to the sequencer which we need to investigate.

Related PR introducing these changes here #1480

Things to explore:

  • can we avoid sender recovery using the method in the PR? Maybe get some extra data from the pool for example.
  • can we make sender recovery more performant using parallelisation?
@doutv
Copy link
Collaborator

doutv commented Jan 7, 2025

Geth cached sender address, in order to avoid calling ECRecover on the same sender address twice
ethereum/go-ethereum#30208
https://github.com/ethereum/go-ethereum/blob/master/core/types/transaction_signing.go#L140

@hexoscott
Copy link
Collaborator Author

Erigon has this same concept of caching, but it exists with the transaction whilst it is in memory. Because the pool is a separate entity to the sequencer in erigon it passes the raw RLP over the interface, so the sequencer needs to decode this once more and won't have the sender cached.

@doutv
Copy link
Collaborator

doutv commented Jan 7, 2025

@revitteth revitteth self-assigned this Jan 8, 2025
@doutv
Copy link
Collaborator

doutv commented Jan 8, 2025

Cache sender address works perfect under our scenario.

We are focusing on ERC4337 transaction performance.
In ERC4337:

  1. User create and sign UserOperation
  2. A backend called bundler will wrap multiple UserOperation into one transaction, and signs the transaction with bundler's private key. Bundler only has 1 private key and the sender address is fixed.
  3. Bundler send the transaction to cdk-erigon RPC

@revitteth revitteth assigned IvanBelyakoff and unassigned revitteth Jan 8, 2025
@doutv
Copy link
Collaborator

doutv commented Jan 10, 2025

Summary Report: Performance Degradation when lots of Overflow Transactions

Co-authored with LLM

Overview

When producing a block, sequencer yield 1000 transactions from txpool, but only 2 are mined, the remaining 998 transactions are retained and need to be re-processed in subsequent blocks.
Repeatedly process the same transactions again and again is time-consuming. Compute transaction hash and recover sender address takes 50% time according to pprof

Test method

Polycli loadtest 5000 ERC4337 transactions, each transaction contains 10 dummy UserOperations, each UserOperation verify 2 ECDSA signatures (P-256 + Secp256r1)
Max Transaction in a block: 2, due to zkCounter limit, trying to include the 3rd transaction to the block will trigger overflow

Key Observations

  1. Transaction Yielding: The system attempts to yield 1000 transactions from the transaction pool for processing in each batch. This is controlled by the yieldSize parameter, I use the default value 1000
  2. Sender Recovery: For each transaction, the sender's address is recovered using the signer.Sender method. This step is necessary for transaction validation but can be computationally expensive, especially when repeated for transactions that are not ultimately included in a block.
  3. Transaction Inclusion: Due to constraints such as gas limits and counter overflows, only 2 transactions are included in a block. The third transaction and beyond are not processed due to overflow conditions.
  4. Transaction Pool Management: The RemoveMinedTransactions function only removes the successfully mined transactions from the pool. The remaining 998 transactions are retained and need to be re-processed in subsequent blocks.
  5. Reprocessing Overhead: The need to repeatedly process the same transactions in subsequent blocks can lead to increased computational overhead and reduced throughput.

Potential Solutions

Estimate and Check zkCounter overflow

This can fix 2 hot spots, both sender recovery and compute transaction hash

YieldBest yield top transactions in txpool, it estimate gas and stop adding more transactions when gas is overflow.
Similarily, we can estimate and check zkCounter overflow here. If zkCounter is already out of limit, then don't yield more transactions.

// if we wouldn't have enough gas for a standard transaction then quit out early
if availableGas < fixedgas.TxGas {
break
}
mt := best.ms[i]
p.Trace("Processing transaction", "txID", mt.Tx.IDHash)
if toSkip.Contains(mt.Tx.IDHash) {
p.Trace("Skipping transaction, already in toSkip", "txID", mt.Tx.IDHash)
continue
}
if !isLondon && mt.Tx.Type == 0x2 {
// remove ldn txs when not in london
toRemove = append(toRemove, mt)
toSkip.Add(mt.Tx.IDHash)
p.Trace("Removing London transaction in non-London environment", "txID", mt.Tx.IDHash)
continue
}
if mt.Tx.Gas > transactionGasLimit {
// Skip transactions with very large gas limit, these shouldn't enter the pool at all
log.Debug("found a transaction in the pending pool with too high gas for tx - clear the tx pool")
p.Trace("Skipping transaction with too high gas", "txID", mt.Tx.IDHash, "gas", mt.Tx.Gas)
continue
}
rlpTx, sender, isLocal, err := p.getRlpLocked(tx, mt.Tx.IDHash[:])
if err != nil {
p.Trace("Error getting RLP of transaction", "txID", mt.Tx.IDHash, "error", err)
return false, count, err
}
if len(rlpTx) == 0 {
toRemove = append(toRemove, mt)
p.Trace("Removing transaction with empty RLP", "txID", mt.Tx.IDHash)
continue
}
// Skip transactions that require more blob gas than is available
blobCount := uint64(len(mt.Tx.BlobHashes))
if blobCount*fixedgas.BlobGasPerBlob > availableBlobGas {
p.Trace("Skipping transaction due to insufficient blob gas", "txID", mt.Tx.IDHash, "requiredBlobGas", blobCount*fixedgas.BlobGasPerBlob, "availableBlobGas", availableBlobGas)
continue
}
availableBlobGas -= blobCount * fixedgas.BlobGasPerBlob
// make sure we have enough gas in the caller to add this transaction.
// not an exact science using intrinsic gas but as close as we could hope for at
// this stage
intrinsicGas, _ := CalcIntrinsicGas(uint64(mt.Tx.DataLen), uint64(mt.Tx.DataNonZeroLen), nil, mt.Tx.Creation, true, true, isShanghai)
if intrinsicGas > availableGas {
// we might find another TX with a low enough intrinsic gas to include so carry on
p.Trace("Skipping transaction due to insufficient gas", "txID", mt.Tx.IDHash, "intrinsicGas", intrinsicGas, "availableGas", availableGas)
continue
}
if intrinsicGas <= availableGas { // check for potential underflow
availableGas -= intrinsicGas
}

Dynamic Yield Size: Adjust the yieldSize dynamically based on the observed block capacity and overflow conditions to reduce unnecessary processing.

This can fix 2 hot spots, both sender recovery and compute transaction hash

Efficient Sender Caching: Improve the caching mechanism for sender addresses to reduce the computational cost of repeated sender recovery.

Only accelerate sender recovery

  • I tried a simple caching method and it works well, improve sequencer TPS 50%.
  • This is a naive cache implementation, only for PoC.
  • Code: okx@f748409

@hexoscott
Copy link
Collaborator Author

Thanks @doutv for the detailed feedback there. A couple of things from my side:

  • estimating zk counters isn't possible as we need to execute the TX to know these, a very simple TX in terms of bytes length could overflow counters for example, where a seemingly complex TX could use very little. There is an issue open for attempting this ahead of time but it won't be trivial and will likely be handled at the point of adding the TX into the pool.
  • Excellent point on the sender recovery being wasteful. I think we can move this up the stack so we only perform this action just before attempting to add the TX into a block. So, in your example of yielding 1000 transactions and only mining 2 then we would at most perform 3 sender recoveries, 2 for the mined txs and a 3rd just before attempting to add tx 3 and finding the overflow.

@doutv
Copy link
Collaborator

doutv commented Jan 13, 2025

batchState.blockState.transactionHashesToSlots[tx.Hash()] = newIds[idx]

Before adding TX into a block, compute hash for 1000 transactions is also time-consuming

@doutv
Copy link
Collaborator

doutv commented Jan 13, 2025

Another simple method: change yieldsize config from 1000 to 10.

  • For huge TX like ERC4337 in my setting. A block can only include 2 TX. Yield 10 TX at a time will reduce the overhead a lot.
  • For small TX like send ETH. A block can include 500+ TX. Yield 10 TX at a time introduce some overhead, due to execute OuterLoopTransactions many times.

@doutv
Copy link
Collaborator

doutv commented Jan 13, 2025

Should I create a PR to fix it? Or you guys want to work on this?

@IvanBelyakoff
Copy link
Collaborator

Should I create a PR to fix it? Or you guys want to work on this?

I would like to work on that. Thanx a lot for all the details, appreciate it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants