-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues with online_delete configuration (Version: 1.5.0-rc3) #3321
Comments
Also consider an optional parameter to set the SQLIte database to not be synchronous. What this should do is allow delete operations to return quickly, releasing the lock. The actual data would be deleted by the operating system asynchronously. This creates a risk of data problems if the server suddenly crashes, but it would likely reduce their symptoms. It should not be done on full history servers or on systems where we care too much about keeping that data in tact. I think these 2 fixes would go a long way towards making things better for bitso the next time they run online delete. Basically, PRAGMA SYNCHRONOUS=OFF |
Also, journal_mode=MEMORY (equally risky for data corruption) looks like it would reduce IO usage, and complement turning synchronous off: https://www.sqlite.org/pragma.html#pragma_journal_mode |
The current plan is to add config support to tweak some of the SQLite
|
@ximinez One issue is that server_info doesn't currently reflect the validated server age. Instead, it reflects last closed ledger. This is in contrast to internal rippled evaluations, such as that used to abend online_delete, which use validated age. Because of this it's not possible to tell whether these evaluations are correct based on diagnostics output. So I created a patch to correct this going forward. Do you mind incorporating into your set of fixes for this, please? |
Some more changes that are tangentially related, but close enough to be added to this issue. (Text imported from internal issue RIPD-1590.) --
Specifically, the https://ripple.com/wiki/NodeBackEnd link is out of date. We should consider changing the defaults to use nudb. It is currently RocksDB. We should consider using the smallest allowed online_delete as the default (256, currently 2000). The [node_db] settings are among the most frequently changed during troubleshooting, so our documentation and defaults should be made as clear as possible. We should explain that the size of ledger history must be adjusted for both the size of disk drive and system RAM (when using RocksDB). And make it clear that online_delete does not run automatically when advisory_delete is not 0. |
* Document delete_batch, back_off_milliseconds, age_threshold_seconds. * Convert those time values to chrono types. * Fix bug that ignored age_threshold_seconds. * Add a "recovery buffer" to the config that gives the node a chance to recover before aborting online delete. * Add begin/end log messages around the SQL queries. * Add a new configuration section: [sqlite] to allow tuning the sqlite database operations. Ignored on full/large history servers. * Update documentation of [node_db] and [sqlite] in the rippled-example.cfg file. * Resolves XRPLF#3321
* Document delete_batch, back_off_milliseconds, age_threshold_seconds. * Convert those time values to chrono types. * Fix bug that ignored age_threshold_seconds. * Add a "recovery buffer" to the config that gives the node a chance to recover before aborting online delete. * Add begin/end log messages around the SQL queries. * Add a new configuration section: [sqlite] to allow tuning the sqlite database operations. Ignored on full/large history servers. * Update documentation of [node_db] and [sqlite] in the rippled-example.cfg file. * Resolves XRPLF#3321
* Document delete_batch, back_off_milliseconds, age_threshold_seconds. * Convert those time values to chrono types. * Fix bug that ignored age_threshold_seconds. * Add a "recovery buffer" to the config that gives the node a chance to recover before aborting online delete. * Add begin/end log messages around the SQL queries. * Add a new configuration section: [sqlite] to allow tuning the sqlite database operations. Ignored on full/large history servers. * Update documentation of [node_db] and [sqlite] in the rippled-example.cfg file. * Resolves XRPLF#3321
* Document delete_batch, back_off_milliseconds, age_threshold_seconds. * Convert those time values to chrono types. * Fix bug that ignored age_threshold_seconds. * Add a "recovery buffer" to the config that gives the node a chance to recover before aborting online delete. * Add begin/end log messages around the SQL queries. * Add a new configuration section: [sqlite] to allow tuning the sqlite database operations. Ignored on full/large history servers. * Update documentation of [node_db] and [sqlite] in the rippled-example.cfg file. * Resolves XRPLF#3321
* Document delete_batch, back_off_milliseconds, age_threshold_seconds. * Convert those time values to chrono types. * Fix bug that ignored age_threshold_seconds. * Add a "recovery buffer" to the config that gives the node a chance to recover before aborting online delete. * Add begin/end log messages around the SQL queries. * Add a new configuration section: [sqlite] to allow tuning the sqlite database operations. Ignored on full/large history servers. * Update documentation of [node_db] and [sqlite] in the rippled-example.cfg file. * Resolves XRPLF#3321
* Document delete_batch, back_off_milliseconds, age_threshold_seconds. * Convert those time values to chrono types. * Fix bug that ignored age_threshold_seconds. * Add a "recovery buffer" to the config that gives the node a chance to recover before aborting online delete. * Add begin/end log messages around the SQL queries. * Add a new configuration section: [sqlite] to allow tuning the sqlite database operations. Ignored on full/large history servers. * Update documentation of [node_db] and [sqlite] in the rippled-example.cfg file. * Resolves XRPLF#3321
* Document delete_batch, back_off_milliseconds, age_threshold_seconds. * Convert those time values to chrono types. * Fix bug that ignored age_threshold_seconds. * Add a "recovery buffer" to the config that gives the node a chance to recover before aborting online delete. * Add begin/end log messages around the SQL queries. * Add a new configuration section: [sqlite] to allow tuning the sqlite database operations. Ignored on full/large history servers. * Update documentation of [node_db] and [sqlite] in the rippled-example.cfg file. * Resolves XRPLF#3321
* Document delete_batch, back_off_milliseconds, age_threshold_seconds. * Convert those time values to chrono types. * Fix bug that ignored age_threshold_seconds. * Add a "recovery buffer" to the config that gives the node a chance to recover before aborting online delete. * Add begin/end log messages around the SQL queries. * Add a new configuration section: [sqlite] to allow tuning the sqlite database operations. Ignored on full/large history servers. * Update documentation of [node_db] and [sqlite] in the rippled-example.cfg file. * Resolves XRPLF#3321
Several issues with the configuration options for
online_delete
[node_db]
config section which can be used to tuneonline_delete
performance. These need to be documented at least in rippled-example.cfg.delete_batch
- number of records to delete per querybackOff
- milliseconds to sleep between deletesage_threshold
- maximum age of the latest validated ledger before the online_delete process abends.age_threshold
option is ignored. Instead,SHAMapStore::health()
uses aconstexpr
value.age_threshold_seconds
, and change theSHAMapStore::ageThreshold_
variable to achrono::seconds
.backOff
is a ms value, but is functional, add a preferredback_off_milliseconds
config option, and only document that one. Also changeSHAMapStore::backOff_
to be achrono::milliseconds
. LeavebackOff
available for backward compatibility for anyone using the undocumented feature.DELETE
SQL query attrace
level to allow for more detailed analysis if desired.Steps to Reproduce
age_threshold
to a value different than the default 60. Something really low, like "1" would be good for this test.online_delete
runs.Expected Result
online_delete
process abends afterage_threshold
seconds.Actual Result
Environment
n/a
Supporting Files
n/a
The text was updated successfully, but these errors were encountered: