Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

solBnkTxSched panic on canary node #34221

Closed
steviez opened this issue Nov 27, 2023 · 2 comments · Fixed by #34229
Closed

solBnkTxSched panic on canary node #34221

steviez opened this issue Nov 27, 2023 · 2 comments · Fixed by #34229

Comments

@steviez
Copy link
Contributor

steviez commented Nov 27, 2023

Problem

One of the canary nodes panicked running f36ab08f:

thread 'solBnkTxSched' panicked at /home/sol/.cargo/registry/src/index.crates.io-6f17d22bba15001f/prio-graph-0.1.0/src/prio_graph.rs:105:17:
blocking node must exist
stack backtrace:
   0: rust_begin_unwind
             at ./rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/panicking.rs:597:5
   1: core::panicking::panic_fmt
             at ./rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/core/src/panicking.rs:72:14
   2: prio_graph::prio_graph::PrioGraph<Id,Rk,Tl,Pfn>::insert_transaction::{{closure}}
   3: prio_graph::prio_graph::PrioGraph<Id,Rk,Tl,Pfn>::insert_transaction
   4: solana_core::banking_stage::transaction_scheduler::prio_graph_scheduler::PrioGraphScheduler::schedule
   5: solana_core::banking_stage::transaction_scheduler::scheduler_controller::SchedulerController::run
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

[2023-11-25T19:31:49.802987422Z ERROR solana_metrics::metrics] datapoint: panic program="validator" thread="solBnkTxSched" one=1i message="panicked at /home/sol/.cargo/registry/src/index.crates.io-6f17d22bba15001f/prio-graph-0.1.0/src/prio_graph.rs:105:17:
    blocking node must exist" location="/home/sol/.cargo/registry/src/index.crates.io-6f17d22bba15001f/prio-graph-0.1.0/src/prio_graph.rs:105:17" version="1.18.0 (src:f36ab08f; feat:1429699964, client:SolanaLabs)"

The node that panicked was sce3, and the log has been set aside at
/home/sol/logs/2023.11.25_panic.log

Proposed Solution

Debug why the node panicked. It is interesting that the panic occurred in a dependency; potentially a bug in the dependency or violating some assumption of the API.

@steviez
Copy link
Contributor Author

steviez commented Nov 27, 2023

CC @apfitzge @taozhu-chicago

@apfitzge
Copy link
Contributor

Bug in assumption of prio-graph, that nodes do not block themselves, i.e. do not contain duplicate locks.
That's a constraint on transactions, but it's not verified before insertion into the prio-graph.

2 obvious options:

  1. verify non-duplicate locks earlier on and during sanitization
  2. make prio-graph handle duplicate account locks

I think probably both should be done, but to get a quicker fix out 1 is simpler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants