Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DB corrupted after graceful shutdown #626

Closed
raphjaph opened this issue Jun 28, 2023 · 10 comments · Fixed by #627
Closed

DB corrupted after graceful shutdown #626

raphjaph opened this issue Jun 28, 2023 · 10 comments · Fixed by #627

Comments

@raphjaph
Copy link

raphjaph commented Jun 28, 2023

I'm upgrading redb from 0.13.0 to 1.0.1 because we wanted to pull in this fix. If I now run this as a service on a debian machine until around blockheight 777000 and do systemctl stop ord and systemctl start ord I get the following:

Stopping Ord server...
ord.service: Succeeded.
Stopped Ord server.
ord.service: Consumed 1h 9min 23.138s CPU time.
Started Ord server.
[2023-06-28T13:02:32Z INFO  ord::options] Connecting to Bitcoin Core at 127.0.0.1:8332/wallet/ord
[2023-06-28T13:02:32Z INFO  ord::options] Using credentials from cookie file at `/var/lib/bitcoind/.cookie`
error: DB corrupted: Failed to repair database. All roots are corrupted
   0: ord::index::Index::open
   1: ord::subcommand::Subcommand::run
   2: ord::main
   3: std::sys_common::backtrace::__rust_begin_short_backtrace
   4: std::rt::lang_start::{{closure}}
   5: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/ops/function.rs:287:13
      std::panicking::try::do_call
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:485:40
      std::panicking::try
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:449:19
      std::panic::catch_unwind
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panic.rs:140:14
      std::rt::lang_start_internal::{{closure}}
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/rt.rs:148:48
      std::panicking::try::do_call
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:485:40
      std::panicking::try
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:449:19
      std::panic::catch_unwind
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panic.rs:140:14
      std::rt::lang_start_internal
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/rt.rs:148:20
   6: main
   7: __libc_start_main
   8: _start
ord.service: Main process exited, code=exited, status=1/FAILURE
ord.service: Failed with result 'exit-code'.

I think I'm shutting it down gracefully and also tested this on Mac, where the same happens. We're also using Multimaps now, in case that helps with debugging. Let me know how I can provide more information.

@raphjaph
Copy link
Author

raphjaph commented Jun 28, 2023

If you want to reproduce this I recommend building from this branch and running with --db-cache-size 2147483648 (16GiB) and without --index-sats. Should take about half an hour to get to height 777000.

@cberner
Copy link
Owner

cberner commented Jun 28, 2023

Uh oh :/ I'll take a look

@cberner
Copy link
Owner

cberner commented Jun 28, 2023

@raphjaph I think I'm not doing it right. Here are the steps I followed:

  1. launch bitcoind
  2. run cargo run --release -- --data-dir=./junk --db-cache-size 2147483648 --height-limit=777000 index run
  3. press ctrl-c after it reaches block ~765k
  4. repeat step (2)

Indexing seems to continue just fine from there. Did I miss a step?

@raphjaph
Copy link
Author

raphjaph commented Jun 28, 2023

@cberner maybe run it without --height-limit and try with server instead of index run and let it go above 777000. Then ctrl-c once and do step 2.

This is what I did:
./target/release/ord --index update-redb.redb --db-cache-size 2147483648 server

@cberner
Copy link
Owner

cberner commented Jun 28, 2023

I was able to reproduce it by using timeout 2000 cargo run --release -- --data-dir=./junk --db-cache-size 2147483648 index run. Digging into what is wrong now.

@cberner
Copy link
Owner

cberner commented Jun 29, 2023

Ok, I think I found the issue. Can you try opening that database with master?

@veryordinally
Copy link

@cberner This looks good! When would you plan to make a release? We'd want to release a new ord version as quickly as possible and prefer to base on a redb release.

@cberner
Copy link
Owner

cberner commented Jun 29, 2023

I'll make a new release today. Just letting the fuzzer run for a few hours first

@raphjaph
Copy link
Author

Awesome, thanks for fixing so quickly!

@cberner
Copy link
Owner

cberner commented Jun 29, 2023

For sure! Thank for finding this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants