Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve online_delete configuration and DB tuning: #3429

Closed
wants to merge 13 commits into from

Conversation

ximinez
Copy link
Collaborator

@ximinez ximinez commented Jun 2, 2020

  • Document delete_batch, back_off_milliseconds, age_threshold_seconds.
  • Convert those time values to chrono types.
  • Fix bug that ignored age_threshold_seconds.
  • Add a "recovery buffer" to the config that gives the node a chance to
    recover before aborting online delete.
  • Add begin/end log messages around the SQL queries.
  • Add a new configuration section: [sqlite] to allow tuning the sqlite
    database operations. Ignored on full/large history servers.
  • Update documentation of [node_db] and [sqlite] in the
    rippled-example.cfg file.
  • Resolves ripple#3321

I added the "recovery buffer" function that wasn't described in the original issue to scratch my own itch, because my local machine was losing sync for a fraction of a second while the ledger was being copied over. This resolved that issue. I defaulted it to not run, so it doesn't cause unexpected behavior.

Note this includes two commits, one from @mtrippled . Both need to be reviewed.


This change is Reviewable

mtrippled and others added 2 commits June 2, 2020 18:33
evaluations for validated ledger age.
* Document delete_batch, back_off_milliseconds, age_threshold_seconds.
* Convert those time values to chrono types.
* Fix bug that ignored age_threshold_seconds.
* Add a "recovery buffer" to the config that gives the node a chance to
  recover before aborting online delete.
* Add begin/end log messages around the SQL queries.
* Add a new configuration section: [sqlite] to allow tuning the sqlite
  database operations. Ignored on full/large history servers.
* Update documentation of [node_db] and [sqlite] in the
  rippled-example.cfg file.
* Resolves XRPLF#3321
@ximinez ximinez added the Documentation README changes, code comments, etc. label Jun 2, 2020
online_delete=2000
type=NuDB
path=/var/lib/rippled/db/nudb
online_delete=512
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If such a small value is the default, please consider also adding a shard_db section with a limit of a few dozen/hundred GiB by default.

Copy link
Collaborator Author

@ximinez ximinez Jun 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarkusTeufelberger Thanks for that feedback. Those default settings are definitely not yet set in stone. My understanding is that the suggestion for the smaller default was to make it easier for more people to participate without requiring huge resources. (The original ticket didn't have a lot of detail.) Of course, there's a balance to be found, because we already have pretty high hardware requirements, and we do want as many nodes as possible to contribute to history storage as a way to help the network.

I'm not sure if shards are ready to be set by default. I'll defer to @miguelportilla on that question.

# delete old ledgers while maintaining at least 2000. Do not require an
# external administrative command to initiate deletion.
# found at https://xrpl.org/capacity-planning.html#node-db-type
# type=NuDB is recommended for non-validators with fast SSDs. Validators or
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use a none database backend on a validator. They don't need to concern themselves with historic data at all.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The linked capacity planning page recommends RocksDB, but I can definitely see your point about using none (or memory). My biggest concern with using those options on the individual node level, is that it means it will take much longer to start and restart, because the node will have to download the entire ledger every time. On the global level, I'm thinking about a failsafe / worst case situation. For example, if every validator shuts down simultaneously 😱, they will need to have some data on disk so they can bootstrap restarting the network.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anecdotally, my full history node takes about half an hour to start with --load vs. a few minutes at most to get synced with --net.

Bootstrapping can be done safely with --ledgerfile (https://xrpl.org/commandline-usage.html#initial-ledger-options) and that's probably better anyways a better way forward than trusting that data was stored to disk correctly in such a catastrophic scenario. The file itself could be hashed and communicated out-of-band in a lot of different ways compared to a node database that's not even a full shard in most cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarkusTeufelberger: re: slow start times with --load: are you using SSDs?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course. Over a dozen TB of 'em.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that --net should become the default now, unless a different command line option (for which I don't have a good name) is specified.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nbougalis I think making --net the default is out of scope for this PR. Do you agree?

cfg/rippled-example.cfg Outdated Show resolved Hide resolved
src/ripple/core/impl/Config.cpp Outdated Show resolved Hide resolved
cfg/rippled-example.cfg Outdated Show resolved Hide resolved
ximinez added 2 commits June 4, 2020 13:53
(Include these items in the main commit when squashing.)
* Update rippled URLs in documentation.
* Full history and fetch depth use max() value instead of an arbitrary
  value of 1 billion.
Copy link
Contributor

@nbougalis nbougalis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments. Still thinking about this.

};

template <std::size_t N, std::size_t M>
DatabaseCon(
Setup const& setup,
std::string const& DBName,
bool useCommonPragma,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if I like this; I'm generally allergic to functions that accept a bool and use it to do one or the other thing. Why can't we set the right values in setup directly?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be pretty simple to set the values in the Setup object, now that you mention it. I might need to define a member function or two, just to make the internals easier, and to make it easier switch between the options for Setup objects that are used to create multiple DatabaseCons

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS. I went down this path because I initially thought the other ctor override would need to use the "common pragma" but without a Setup object being passed in. I later realized none of them do, but didn't backtrack to see if the bool was still a good idea. So thanks for pointing this out!

static std::unique_ptr<std::vector<std::string> const> CommonPragma;
/// Shortcut used by the database connections that ignore the common
/// pragma strings
static const std::vector<std::string> NoCommonPragma;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

μNit: weird case for the name(s) here. Also confused as to why two separate things are needed.

Copy link
Collaborator Author

@ximinez ximinez Jun 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of other static class variables have capitalized names, but I don't necessarily have to keep doing it.

Edit: I looked again, and didn't find any non-const examples. So 🤷‍♂️ I'll make it more sensible.

// full history nodes. This is because such a large amount of data will
// be more difficult to recover if a rare failure occurs, which are
// more likely with some of the other available tuning settings.
inline constexpr std::uint32_t SQLITE_TUNING_CUTOFF = 100'000'000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the rationale behind this, but I'm not sure if it's a good idea. I think that it's reasonable to warn in the configuration file and even on every startup about these options, but making the options entirely unavailable is not. Ironically, the existing logic that enforces this will prevent choosing options that are safer than the default even if the server operator really wants them.

I'll give this some more thought, but right now, it seems like a bad idea.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change it to warning log message(s) pretty easily, and update the config file.

ximinez added 2 commits June 5, 2020 19:02
* changed the sqlite ledger_history cutoff to log a warning instead of
  ignoring sqlite settings. updated documentation.
* integrated the commonpragma back in to the setup struct and restored
  the interface of the databasecon ctors.
* involved moving some test logging subclasses around to be more easily
  accessible, and some minor refactoring of Env.
@ximinez ximinez requested a review from nbougalis June 8, 2020 20:45
src/ripple/core/impl/DatabaseCon.cpp Outdated Show resolved Hide resolved
@@ -555,7 +556,9 @@ run(int argc, char** argv)
session << "PRAGMA temp_store_directory=\"" << tmpPath.string()
<< "\";";
session << "VACUUM;";
session << "PRAGMA journal_mode=WAL;";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing behavior is that the journal_mode=OFF during the VACUUM activity introduces risk of corruption. Instead, I think the behavior of VACUUM should reflect the new config options and defaults for the txdb. Namely, use dbSetup.usePragma() (not noPragma()) and set the configs, then add the "PRAGMA temp_store_directory=" line and then execute VACUUM. Basically, treat the vacuum with the same "safety_mode" as normal operation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out that temp_store_directory is deprecated: https://www.sqlite.org/pragma.html#pragma_temp_store_directory.

It looks like you're right about the journal_mode: https://www.sqlite.org/lang_vacuum.html

The VACUUM command works by copying the contents of the database into a temporary database file and then overwriting the original with the contents of the temporary file. When overwriting the original, a rollback journal or write-ahead log WAL file is used just as it would be for any other database transaction.

So I'll update that.

@@ -348,23 +359,14 @@ SHAMapStoreImp::run()

// will delete up to (not including) lastRotated
if (validatedSeq >= lastRotated + deleteInterval_ &&
canDelete_ >= lastRotated - 1)
canDelete_ >= lastRotated - 1 && !health())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The health() check is also performed a few lines later, during clearPrior(). Why introduce it here as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Primarily, it's to prevent writing the log message if no actual work is going to get done. Compare to what it was doing before: log the "rotating" message, then do a health() check, then clearPrior().

If any change should be made, I'd think it would be to take the first health() check out of clearPrior(). What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, removing the first health() from clearPrior() makes sense.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -378,14 +380,13 @@ SHAMapStoreImp::run()
default:;
}

JLOG(journal_.trace()) << "copying ledger " << validatedSeq;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May as well make this debug(), so we can see how long it takes to copy ledgers without setting the level to trace(). This is a very low-frequency message.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that makes sense.


JLOG(journal_.trace()) << "freshening caches";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may as well be debug() for the same reason as above.

assert(false);
return false;
}
JLOG(journal_.trace()) << "End: Look up lowest value of: " << minQuery;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have this check? Does it catch a non-existent table or some other error that makes checking if m is unset insufficient?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly don't remember for sure why I added the if (auto db = database.checkoutDb()) check... I think it was so that it would go out of scope and release the lock sooner, and then I added the else because it "wouldn't hurt". It would make more sense to rewrite it as a simple local block.

else
{
assert(false);
return false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No caller of clearSql() checks the return value. This function can return void instead.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Will fix.

src/ripple/app/misc/SHAMapStoreImp.cpp Show resolved Hide resolved
* Make `safety_level` and other tuning options mutually exclusive
* Clean up VACUUM functionality and tuning to use settings.
* Fix bugs in the config setup where it still ignored tuning settings if
  full history. Add tests to catch that mistake.
* Fix some bugs in the unit tests, and make them more robustly check
  error messages.
@carlhua carlhua assigned gregtatcam and unassigned thejohnfreeman Jun 15, 2020
ximinez added 2 commits June 16, 2020 14:50
* Change some log message levels.
* clearSql no longer returns a bool that every caller ignores.
* Restructure a few unnecessary validity checks
@carlhua carlhua assigned seelabs and unassigned gregtatcam Jun 17, 2020
Copy link
Collaborator

@seelabs seelabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some minor comments, but overall this looks good.

src/ripple/app/misc/NetworkOPs.cpp Outdated Show resolved Hide resolved
cfg/rippled-example.cfg Outdated Show resolved Hide resolved
cfg/rippled-example.cfg Show resolved Hide resolved
src/ripple/core/DatabaseCon.h Outdated Show resolved Hide resolved
src/ripple/core/DatabaseCon.h Outdated Show resolved Hide resolved
src/ripple/app/misc/SHAMapStoreImp.cpp Outdated Show resolved Hide resolved
src/ripple/app/misc/SHAMapStoreImp.cpp Outdated Show resolved Hide resolved
src/ripple/core/impl/DatabaseCon.cpp Outdated Show resolved Hide resolved
src/ripple/nodestore/impl/Shard.cpp Outdated Show resolved Hide resolved
src/test/jtx/CheckMessageLogs.h Outdated Show resolved Hide resolved
@@ -552,9 +549,6 @@ run(int argc, char** argv)
std::cout << "VACUUM beginning. page_size: " << pageSize
<< std::endl;

session << "PRAGMA journal_mode=OFF;";
session << "PRAGMA temp_store_directory=\"" << tmpPath.string()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of the temp_store_directory is to make sure there's enough space to perform a VACUUM, which essentially requires as much as the entire database being VACUUMed. The temp_store in memory will likely cause a system to run out of RAM. Imagine transaction DB with 2TB+. (https://sqlite.org/tempfiles.html)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I'll change that.

Copy link
Collaborator

@mDuo13 mDuo13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked over the documentation changes. Definitely a big improvement over the previous version! I left a few optional suggestions to tune up the language in a few places, but 👍 with or without those changes.

@@ -892,6 +939,12 @@
# [import_db] Settings for performing a one-time import (optional)
# [database_path] Path to the book-keeping databases.
#
# There are 4 or 5 bookkeeping SQLite databases that the server creates and
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it "4 or 5"? Is one of them only added with certain configurations? Without more context, the phrase "4 or 5" feels uncertain and vague. Maybe "4 to 5" would carry the right implication.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think state.db is created if you're using online_delete, so that would be the 5th one. I'll make the change.

# maintains. If you omit this configuration setting, it will default to
# creating a directory called "db" located in the same place as your
# rippled.cfg file. Partial pathnames will be considered relative to
# the location of the rippled executable.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: instead of "it will default to creating a directory", just say, "the server creates a directory." Similarly, instead of "will be considered relative" just say "are relative".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed and reworded to be less passive tone.

# external administrative command to initiate deletion.
# found at https://xrpl.org/capacity-planning.html#node-db-type
# type=NuDB is recommended for non-validators with fast SSDs. Validators or
# slow / spinning disks should use RocksDB.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a note like this:

Caution: Spinning disks are not recommended. They do not perform well enough to consistently remain synced to the network.

@@ -228,14 +228,14 @@ Ledger::Ledger(
!txMap_->fetchRoot(SHAMapHash{info_.txHash}, nullptr))
{
loaded = false;
JLOG(j.warn()) << "Don't have TX root for ledger";
JLOG(j.warn()) << "Don't have TX root for ledger" << info_.seq;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: as long as we're changing these messages, we could make them clearer by removing the abbreviations "TX" (here) and "AS" (below, line 238). Instead, just say "transaction" and "state data".

* Rename retry_buffer_seconds to retry_wait_seconds.
* Improve documentation and examples in example.cfg file.
* Simplify the DatabaseCon::Setup pragma handling.
* Other misc cleanups
Copy link
Collaborator

@seelabs seelabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I did leave one nit that I think we should resolve (the Base parameter on the DatabaseCon constructor). However since that's so minor I'll give a thumbs up now.

src/ripple/core/DatabaseCon.h Outdated Show resolved Hide resolved
src/ripple/core/DatabaseCon.h Outdated Show resolved Hide resolved
@carlhua carlhua requested a review from mtrippled June 24, 2020 12:24
// Only the most trivial databases will fit in memory on typical
// (recommended) software. Force temp files to be written to disk
// regardless of the config settings.
session << boost::format(CommonDBPragmaTemp) % "file";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original behavior needs to come back--namely, that the actual temp directory be specified. Otherwise, SQLIte will by default use the directory in which the database is.

Alternately, I think that getting rid of this feature altogether makes a lot of sense. If somebody really wants to VACUUM a database, it's easier to use the command line tool.

Copy link
Collaborator

@mtrippled mtrippled left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
👍

* Improved some of the documentation and log messages
* Removed the unnecessary `DatabaseCon::Base` ctor helper class
@carlhua carlhua added the Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required. label Jun 24, 2020
@mDuo13
Copy link
Collaborator

mDuo13 commented Jun 24, 2020

New changes 👍 from me.

@ximinez ximinez deleted the onlinedeleteconfig branch June 30, 2020 22:26
intelliot referenced this pull request Sep 15, 2024
* Document delete_batch, back_off_milliseconds, age_threshold_seconds.
* Convert those time values to chrono types.
* Fix bug that ignored age_threshold_seconds.
* Add a "recovery buffer" to the config that gives the node a chance to
  recover before aborting online delete.
* Add begin/end log messages around the SQL queries.
* Add a new configuration section: [sqlite] to allow tuning the sqlite
  database operations. Ignored on full/large history servers.
* Update documentation of [node_db] and [sqlite] in the
  rippled-example.cfg file.

Resolves #3321
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation README changes, code comments, etc. Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Issues with online_delete configuration (Version: 1.5.0-rc3)
9 participants