-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve online_delete configuration and DB tuning: #3429
Changes from 7 commits
063c3b8
6e9051e
4b40877
e42c1ee
f10c335
088d0af
6b96876
18f5d07
9173a6a
10e02e6
e060249
c1fff2e
2b01158
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -36,7 +36,7 @@ | |
# For more information on where the rippled server instance searches for the | ||
# file, visit: | ||
# | ||
# https://developers.ripple.com/commandline-usage.html#generic-options | ||
# https://xrpl.org/commandline-usage.html#generic-options | ||
# | ||
# This file should be named rippled.cfg. This file is UTF-8 with DOS, UNIX, | ||
# or Mac style end of lines. Blank lines and lines beginning with '#' are | ||
|
@@ -869,18 +869,65 @@ | |
# | ||
# These keys are possible for any type of backend: | ||
# | ||
# earliest_seq The default is 32570 to match the XRP ledger | ||
# network's earliest allowed sequence. Alternate | ||
# networks may set this value. Minimum value of 1. | ||
# If a [shard_db] section is defined, and this | ||
# value is present either [node_db] or [shard_db], | ||
# it must be defined with the same value in both | ||
# sections. | ||
# | ||
# online_delete Minimum value of 256. Enable automatic purging | ||
# of older ledger information. Maintain at least this | ||
# number of ledger records online. Must be greater | ||
# than or equal to ledger_history. | ||
# | ||
# advisory_delete 0 for disabled, 1 for enabled. If set, then | ||
# require administrative RPC call "can_delete" | ||
# to enable online deletion of ledger records. | ||
# These keys modify the behavior of online_delete, and thus are only | ||
# relevant if online_delete is defined and non-zero: | ||
# | ||
# earliest_seq The default is 32570 to match the XRP ledger | ||
# network's earliest allowed sequence. Alternate | ||
# networks may set this value. Minimum value of 1. | ||
# advisory_delete 0 for disabled, 1 for enabled. If set, the | ||
# administrative RPC call "can_delete" is required | ||
# to enable online deletion of ledger records. | ||
# Online deletion does not run automatically if | ||
# non-zero and the last deletion was on a ledger | ||
# greater than the current "can_delete" setting. | ||
# Default is 0. | ||
# | ||
# delete_batch When automatically purging, SQLite database | ||
# records are deleted in batches. This value | ||
# controls the maximum size of each batch. Larger | ||
# batches keep the databases locked for more time, | ||
# which may cause other functions to fall behind, | ||
# and thus cause the node to lose sync. | ||
# Default is 100. | ||
# | ||
# back_off_milliseconds | ||
# Number of milliseconds to wait between | ||
# online_delete batches to allow other functions | ||
# to catch up. | ||
# Default is 100. | ||
# | ||
# age_threshold_seconds | ||
# The online delete process will only run if the | ||
# latest validated ledger is younger than this | ||
# number of seconds. | ||
# Default is 60. | ||
# | ||
# recovery_buffer_seconds | ||
# The online delete process checks periodically | ||
# that rippled is still in sync with the network, | ||
# and that the validated ledger is less than | ||
# 'age_threshold_seconds' old. By default, if it | ||
# is not the online delete process aborts and | ||
# tries again later. If 'recovery_buffer_seconds' | ||
# is set and rippled is out of sync, but likely to | ||
# recover quickly, then online delete will wait | ||
# this number of seconds for rippled to get back | ||
# into sync before it aborts. | ||
# Set this value if the node is otherwise staying | ||
# in sync, or recovering quickly, but the online | ||
# delete process is unable to finish. | ||
# Default is unset. | ||
# | ||
# Notes: | ||
# The 'node_db' entry configures the primary, persistent storage. | ||
|
@@ -892,6 +939,12 @@ | |
# [import_db] Settings for performing a one-time import (optional) | ||
# [database_path] Path to the book-keeping databases. | ||
# | ||
# There are 4 or 5 bookkeeping SQLite databases that the server creates and | ||
# maintains. If you omit this configuration setting, it will default to | ||
# creating a directory called "db" located in the same place as your | ||
# rippled.cfg file. Partial pathnames will be considered relative to | ||
# the location of the rippled executable. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: instead of "it will default to creating a directory", just say, "the server creates a directory." Similarly, instead of "will be considered relative" just say "are relative". There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Changed and reworded to be less passive tone. |
||
# | ||
# [shard_db] Settings for the Shard Database (optional) | ||
# | ||
# Format (without spaces): | ||
|
@@ -907,12 +960,68 @@ | |
# | ||
# max_size_gb Maximum disk space the database will utilize (in gigabytes) | ||
# | ||
# [sqlite] Tuning settings for the SQLite databases (optional) | ||
# | ||
# There are 4 bookkeeping SQLite database that the server creates and | ||
# maintains. If you omit this configuration setting, it will default to | ||
# creating a directory called "db" located in the same place as your | ||
# rippled.cfg file. Partial pathnames will be considered relative to | ||
# the location of the rippled executable. | ||
# Format (without spaces): | ||
# One or more lines of case-insensitive key / value pairs: | ||
# <key> '=' <value> | ||
# ... | ||
# | ||
# Example: | ||
# sync_level=low | ||
# journal_mode=off | ||
# | ||
# WARNING: These settings can have significant effects on data integrity, | ||
# particularly in failure scenarios. It is strongly recommended that they | ||
# be left at their defaults unless the server is having performance issues | ||
# during normal operation or during automatic purging (online_delete) | ||
# operations. A warning will be logged on startup if 'ledger_history' | ||
# is configured to store more than 10,000,000 ledgers and any of these | ||
# settings are less safe than the default. This is due to the inordinate | ||
# amount of time and bandwidth it will take to safely rebuild a corrupted | ||
# database from other peers. | ||
# | ||
# Optional keys: | ||
# | ||
# safety_level Valid values: high, low | ||
# The default is "high", and tunes the SQLite | ||
# databases in the most reliable mode. "low" | ||
# is equivalent to | ||
seelabs marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# journal_mode=memory | ||
# synchronous=off | ||
# temp_store=memory | ||
# These settings trade speed and reduced I/O | ||
# for a higher risk of data loss. See the | ||
# individual settings below for more information. | ||
# | ||
# journal_mode Valid values: delete, truncate, persist, memory, wal, off | ||
# The default is "wal", which uses a write-ahead | ||
# log to implement database transactions. | ||
# Alternately, "memory" saves disk I/O, but if | ||
# rippled crashes during a transaction, the | ||
# database is likely to be corrupted. | ||
# See https://www.sqlite.org/pragma.html#pragma_journal_mode | ||
# for more details about the available options. | ||
# | ||
# synchronous Valid values: off, normal, full, extra | ||
# The default is "normal", which works well with | ||
# the "wal" journal mode. Alternatively, "off" | ||
# allows rippled to continue as soon as data is | ||
# passed to the OS, which can significantly | ||
# increase speed, but risks data corruption if | ||
# the host computer crashes before writing that | ||
# data to disk. | ||
# See https://www.sqlite.org/pragma.html#pragma_synchronous | ||
# for more details about the available options. | ||
# | ||
# temp_store Valid values: default, file, memory | ||
seelabs marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# The default is "file", which will use files | ||
# for temporary database tables and indices. | ||
# Alternatively, "memory" may save I/O, but | ||
# rippled does not currently use many, if any, | ||
# of these temporary objects. | ||
# See https://www.sqlite.org/pragma.html#pragma_temp_store | ||
# for more details about the available options. | ||
# | ||
# | ||
# | ||
|
@@ -1212,24 +1321,25 @@ medium | |
|
||
# This is primary persistent datastore for rippled. This includes transaction | ||
# metadata, account states, and ledger headers. Helpful information can be | ||
# found here: https://ripple.com/wiki/NodeBackEnd | ||
# delete old ledgers while maintaining at least 2000. Do not require an | ||
# external administrative command to initiate deletion. | ||
# found at https://xrpl.org/capacity-planning.html#node-db-type | ||
# type=NuDB is recommended for non-validators with fast SSDs. Validators or | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would use a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The linked capacity planning page recommends RocksDB, but I can definitely see your point about using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Anecdotally, my full history node takes about half an hour to start with Bootstrapping can be done safely with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @MarkusTeufelberger: re: slow start times with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Of course. Over a dozen TB of 'em. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @nbougalis I think making |
||
# slow / spinning disks should use RocksDB. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Consider adding a note like this:
|
||
# online_delete=512 is recommended to delete old ledgers while maintaining at | ||
# least 512. | ||
# advisory_delete=0 allows the online delete process to run automatically | ||
# when the node has approximately two times the "online_delete" value of | ||
# ledgers. No external administrative command is required to initiate | ||
# deletion. | ||
[node_db] | ||
type=RocksDB | ||
path=/var/lib/rippled/db/rocksdb | ||
open_files=2000 | ||
filter_bits=12 | ||
cache_mb=256 | ||
file_size_mb=8 | ||
file_size_mult=2 | ||
online_delete=2000 | ||
type=NuDB | ||
path=/var/lib/rippled/db/nudb | ||
online_delete=512 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If such a small value is the default, please consider also adding a shard_db section with a limit of a few dozen/hundred GiB by default. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @MarkusTeufelberger Thanks for that feedback. Those default settings are definitely not yet set in stone. My understanding is that the suggestion for the smaller default was to make it easier for more people to participate without requiring huge resources. (The original ticket didn't have a lot of detail.) Of course, there's a balance to be found, because we already have pretty high hardware requirements, and we do want as many nodes as possible to contribute to history storage as a way to help the network. I'm not sure if shards are ready to be set by default. I'll defer to @miguelportilla on that question. |
||
advisory_delete=0 | ||
|
||
# This is the persistent datastore for shards. It is important for the health | ||
# of the ripple network that rippled operators shard as much as practical. | ||
# NuDB requires SSD storage. Helpful information can be found here | ||
# https://ripple.com/build/history-sharding | ||
# NuDB requires SSD storage. Helpful information can be found at | ||
# https://xrpl.org/history-sharding.html | ||
#[shard_db] | ||
#path=/var/lib/rippled/db/shards/nudb | ||
#max_size_gb=500 | ||
|
@@ -1248,7 +1358,8 @@ time.apple.com | |
time.nist.gov | ||
pool.ntp.org | ||
|
||
# To use the XRP test network (see https://ripple.com/build/xrp-test-net/), | ||
# To use the XRP test network | ||
# (see https://xrpl.org/connect-your-rippled-to-the-xrp-test-net.html), | ||
# use the following [ips] section: | ||
# [ips] | ||
# r.altnet.rippletest.net 51235 | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -228,14 +228,14 @@ Ledger::Ledger( | |
!txMap_->fetchRoot(SHAMapHash{info_.txHash}, nullptr)) | ||
{ | ||
loaded = false; | ||
JLOG(j.warn()) << "Don't have TX root for ledger"; | ||
JLOG(j.warn()) << "Don't have TX root for ledger" << info_.seq; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: as long as we're changing these messages, we could make them clearer by removing the abbreviations "TX" (here) and "AS" (below, line 238). Instead, just say "transaction" and "state data". |
||
} | ||
|
||
if (info_.accountHash.isNonZero() && | ||
!stateMap_->fetchRoot(SHAMapHash{info_.accountHash}, nullptr)) | ||
{ | ||
loaded = false; | ||
JLOG(j.warn()) << "Don't have AS root for ledger"; | ||
JLOG(j.warn()) << "Don't have AS root for ledger" << info_.seq; | ||
} | ||
|
||
txMap_->setImmutable(); | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -541,6 +541,7 @@ run(int argc, char** argv) | |
return -1; | ||
} | ||
|
||
dbSetup.noPragma(); | ||
auto txnDB = std::make_unique<DatabaseCon>( | ||
dbSetup, TxDBName, TxDBPragma, TxDBInit); | ||
auto& session = txnDB->getSession(); | ||
|
@@ -555,7 +556,9 @@ run(int argc, char** argv) | |
session << "PRAGMA temp_store_directory=\"" << tmpPath.string() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The purpose of the temp_store_directory is to make sure there's enough space to perform a VACUUM, which essentially requires as much as the entire database being VACUUMed. The temp_store in memory will likely cause a system to run out of RAM. Imagine transaction DB with 2TB+. (https://sqlite.org/tempfiles.html) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good catch! I'll change that. |
||
<< "\";"; | ||
session << "VACUUM;"; | ||
session << "PRAGMA journal_mode=WAL;"; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The existing behavior is that the journal_mode=OFF during the VACUUM activity introduces risk of corruption. Instead, I think the behavior of VACUUM should reflect the new config options and defaults for the txdb. Namely, use dbSetup.usePragma() (not noPragma()) and set the configs, then add the "PRAGMA temp_store_directory=" line and then execute VACUUM. Basically, treat the vacuum with the same "safety_mode" as normal operation. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Turns out that It looks like you're right about the
So I'll update that. |
||
assert(dbSetup.globalPragma); | ||
for (auto const& p : *dbSetup.globalPragma) | ||
session << p; | ||
session << "PRAGMA page_size;", soci::into(pageSize); | ||
|
||
std::cout << "VACUUM finished. page_size: " << pageSize | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it "4 or 5"? Is one of them only added with certain configurations? Without more context, the phrase "4 or 5" feels uncertain and vague. Maybe "4 to 5" would carry the right implication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think
state.db
is created if you're usingonline_delete
, so that would be the 5th one. I'll make the change.