Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQLite3 transaction.db out of Space on Full History Servers (Version: all) #5102

Closed
jscottbranson opened this issue Aug 23, 2024 · 38 comments · Fixed by #5114
Closed

SQLite3 transaction.db out of Space on Full History Servers (Version: all) #5102

jscottbranson opened this issue Aug 23, 2024 · 38 comments · Fixed by #5114

Comments

@jscottbranson
Copy link
Contributor

Issue Description

rippled relies on SQLite3 for the transaction.db.

SQLite3 databases have limitations (max_page_count and page_size) to prevent databases from growing excessively.

page_size defaults to 4096 bytes across all recent versions of SQLite3, and rippled passes a "PRAGMA" statement to SQLite3 when the connection is opened reinforcing this page size. The page_size is stored in the database itself, and changing the page_size requires the database to be recreated. Recreating the database takes significant time as well as storage space, and I have not successfully created a transaction.db with larger page sizes, despite multiple attempts. There are also potential performance implications associated with changing page_size, and SQLite documentation suggests most applications will benefit from using the default 4096 byte setting.

In contrast, max_page_count seems to vary significantly across SQLite3 versions, and it is not stored in the database itself. Rather, max_page_count is set via a PRAGMA statement when the SQLite3 connection is opened, or the default value for a given SQLite3 version is used. rippled passes a PRAGMA statement to SQLite3 upon opening the database, defining max_page_count = 2147483646 pages. With page_size of 4096 bytes * max_page_count of 2147483646, the maximum size for the transaction.db file is 8.796093 TB. Currently, my full history servers' databases are 8.3 TB. Thus, it is possible that all full history servers are approaching the point where the transaction.db file will be full. It seems urgent to remedy this, or risk all full history servers crashing over the next few months.

The most obvious solution seems to be increasing the value passed by rippled for max_page_count to the current (2024) default of 4294967294 pages (the SQLite change log indicates a prior default of 1073741824 pages). However, defining max_page_count via a PRAGMA statement is not always successful.

For example, I have two full history servers running on Rocky Linux 8.10, which ships with SQLite version 3.26.0 2018-12-01. rippled has crashed on both of these servers with an error message that the transaction.db is full. Querying the page_count for the transaction.db files on both servers indicates that 2147354788 pages have been used. This is less than the value passed by rippled to SQLite3 (2147483646 - 2147354788 = 128858 pages that should still be available). However, 2147354788 seems to be the default value for the version of SQLite3 that is running on the servers (this is different from the default value described in the SQLite3 change log). This is verified by sqlite3 transaction.db 'PRAGMA max_page_count;' returning 2147354788. Thus, it might be necessary for full history operators to also update the version of SQLite3 they are running. Full history servers running on Ubuntu 22.04, for example, seem to respect the rippled PRAGMA statement and can successfully work with transaction.db files that have more pages than the SQLite3 default.

Steps to Reproduce

sqlite3 transaction.db 'PRAGMA page_count;' returns the current number of pages used in the transaction.db. rippled will likely crash once this integer exceeds value provided by rippled of 2147483646.

Expected Result

Full history servers should happily record and serve full transaction history.

Actual Result

rippled crashes with SQLiteDatabaseImp:FTL Free SQLite space for transaction db is less than 512MB. To fix this, rippled must be executed with the vacuum parameter before restarting. Note that this activity can take multiple days, depending on database size.

I have not successfully vacuumed the database since doing so takes an enormous amount of disk space. Also, I don't believe vacuuming will be useful in this instance, since it only frees space if data has been deleted from the database, and it is unlikely full history servers would remove transaction data. The servers that produce the above error message have >5TB free disk space, however, the transaction.db files are out of free pages.

Environment

This impacts all operating environments, as the limit defined in rippled itself is being approached. Users running older operating systems and versions of SQLite could be impacted more quickly, such as my Rocky Linux 8.10 systems.

@WietseWind
Copy link
Member

We're VERY close indeed, to all FHs crashing. Most FHs will be 26 million pages from disaster.

Default value sqlite3 value is 1073741823 btw. Much higher.

This needs a fix and to be installed on FH ASAP.

Related:
The question is why you have more pages than the rest of us ;)

You wrote:

Currently, my full history servers' databases are 8.3 TB

Interestingly all our FHs are:

rippled rippled 7.9T Aug 23 22:58 transaction.db

... So that would explain why you reached the page limit and I didn't: we don't have the same transactions!?

@bachase
Copy link
Collaborator

bachase commented Aug 29, 2024

Thanks for the level of detail in this report. @vvysokikh1 will be looking into changes to the page size/count settings and sqlite version changes.

Given you've tried changing the page size, can you share more details on what you've tried and how it has failed?

Depending on your use case, https://github.com/XRPLF/clio might be an alternative that is more scalable. Please consider leaving any feedback in an issue there if relevant.

@vvysokikh1
Copy link
Collaborator

Hi @jscottbranson

I'm currently looking at this request and changing the line in the code to be consistent with current default max_page_count doesn't seem to be an issue.

My main concern here would be about your situation with Rocky Linux. We use Conan to fetch SQLite and it should NOT be dependent on what your distro has in its repositories. Your assumption about Rocky Linux SQLite max_page_count = 2147354788 doesn't make much sense to me (I expect it to be either 2^30 - 2 or 2^32 - 2 or other power of 2 variation). My guess here would be that for some reason the connection is initialized with lower max_page_count than your already existing db file. What version of rippled are you using on those systems?

@XRPLF XRPLF deleted a comment from alloynetworks Aug 29, 2024
@alloynetworks
Copy link
Contributor

Depending on your use case, https://github.com/XRPLF/clio might be an alternative that is more scalable. Please consider leaving any feedback in an issue there if relevant.

And where would the Clio sync from if FH servers are not unavailable because of the SQLite issue?

@bachase
Copy link
Collaborator

bachase commented Aug 29, 2024

Depending on your use case, https://github.com/XRPLF/clio might be an alternative that is more scalable. Please consider leaving any feedback in an issue there if relevant.

And where would the Clio sync from if FH servers are not unavailable because of the SQLite issue?

I agree that this issue needs addressing for FH servers.

The suggestion for Clio was if there are other infrastructure reasons or reporting use cases that are better served by Clio, folks can consider it.

@jscottbranson
Copy link
Contributor Author

@vvysokikh1, I'm running the latest stable rippled (2.2.1).

I'm pretty confused how I ended up stuck with that specific page size.

@jscottbranson
Copy link
Contributor Author

@bachase I used "vacuum" and "vacuum into" pragma statements in sqlite3 running on a Rocky Linux 9.4 box (a different machine from my FH servers). Both statements failed with an out of disk space error, despite approximately 24 TB of free space.

@vvysokikh1
Copy link
Collaborator

vvysokikh1 commented Aug 30, 2024

Hi @jscottbranson

I've given it another thought in the morning. I have a couple of assumptions and questions:

Rippled connects to your db file and actually fails as expected. Your file has 2147354788 pages, max pages in current setup 2^31 - 2 = 2147483646. This leaves 128858 free pages of size 4KB, and that is less than 512MB. When you connect directly to db file to check max_page_count, you probably don't set this value for this new connection and it defaults to 2^30 - 2 and since it's less than current number of pages it automatically increases it accordingly. In this particular situation I believe that increasing the max_page_count in rippled codebase would fix this.

My question here is actually the same that @WietseWind brought. Why is your transactions.db larger than his? Also, assuming your ubuntu server works, it would have to also fail with the same issue, given exact same history file. Could you actually compare if your transaction.db files are of the same size on both your ubuntu and rocky linux servers?

@jscottbranson
Copy link
Contributor Author

jscottbranson commented Aug 31, 2024

@vvysokikh1, that makes good sense - I don't know how I overlooked that working out to <512MB.

I'm only running RHEL variants, so I don't have an Ubuntu machine to compare.

@interc0der
Copy link
Collaborator

@bachase I used "vacuum" and "vacuum into" pragma statements in sqlite3 running on a Rocky Linux 9.4 box (a different machine from my FH servers). Both statements failed with an out of disk space error, despite approximately 24 TB of free space.

I tried vacuuming the same file on Ubuntu 22.04 (VACUUM INTO) with roughly 40T of available space and a defined SQLITE_TMPDIR environment path.

Error: stepping, database or disk is full (13)

@WietseWind
Copy link
Member

We're seconds away from being very f*cked, one by one the FHs are now failing,

2024-Sep-14 10:33:26.819499226 UTC SQLiteDatabaseImp:FTL Free SQLite space for transaction db is less than 512MB. To fix this, rippled must be executed with the vacuum parameter before restarting. Note that this activity can take multiple days, depending on database size.
2024-Sep-14 10:33:28.057461805 UTC JobQueue:ERR JobQueue waiting for ClosureCounter::join().

@sublimator
Copy link
Contributor

sublimator commented Sep 14, 2024 via email

@alloynetworks
Copy link
Contributor

We're seconds away from being very f*cked, one by one the FHs are now failing,

Two of my full history went down too around the same time.

@sublimator
Copy link
Contributor

sublimator commented Sep 14, 2024 via email

@WietseWind
Copy link
Member

WietseWind commented Sep 14, 2024

sqlite> PRAGMA max_page_count;
2147356198
sqlite> PRAGMA page_count;
2147356198

Well that explains that.

Max can be set to:
4294967294

But rippled sets it on every connection:

"PRAGMA max_page_count=2147483646;",

This needs a binary update rn.

@RichardAH can you do a custom build based on 2.2.2?

@WietseWind
Copy link
Member

I tried vacuuming the same file on Ubuntu 22.04 (VACUUM INTO) with roughly 40T of available space and a defined SQLITE_TMPDIR environment path.

Error: stepping, database or disk is full (13)

Tried vacuuming the database as well but didn't work

image

@sublimator
Copy link
Contributor

Hahaha

@rippleitinnz
Copy link

😂

@sublimator
Copy link
Contributor

Many lols make light work

@danielwwf
Copy link

I see the issue, thats a tiny vacuum. To weak for

I tried vacuuming the same file on Ubuntu 22.04 (VACUUM INTO) with roughly 40T of available space and a defined SQLITE_TMPDIR environment path.
Error: stepping, database or disk is full (13)

Tried vacuuming the database as well but didn't work

image

Obviously, the wrong vacuum for the job.

@jscottbranson
Copy link
Contributor Author

Daniel, you underestimate Dyson!

@bachase
Copy link
Collaborator

bachase commented Sep 14, 2024

😂 for that vacuum attempt @WietseWind !

But for some added context:

  1. @vvysokikh1 was working to have the fix from @jscottbranson as part of the next major release, where we had assumed based on the earlier discussion in this issue that there was more runway until it became a problem.
  2. To be clear, this does not impact ALL full history servers; only those that hadn't followed the steps to changing the page size. Ripple operated FH servers had vaccuumd to the new page size in 2018, when the change was originally recommended (Add tutorial for increasing transaction db page size. xrpl-dev-portal#443). Apparently the Dyson model was working then! Based on sqlite change log shared in the original issue, the option to change the max page count was not an option back then.
  3. We should continue to understand why vacuum is failing for folks (what version of sqlite or rippled are you using?). But totally understand bumping the page count is now the easier option.
  4. I very much appreciate that @WietseWind and @RichardAH are working on a patch fix to immediately address the issue on their servers.
  5. We will consider accelerating a hotfix for this more broadly as it appears many (most?) FH operators were unable to follow the page size change steps from 2018.

@bachase
Copy link
Collaborator

bachase commented Sep 14, 2024

Thanks @jscottbranson for this fix and @intelliot preparing 2.2.3 hotfix elease that will include it.

@alloynetworks
Copy link
Contributor

Whoever is leading the team at Ripple for merging should explain why core fixes don’t go to the top of the heap. If this continues, I don’t think Ripple is an appropriate package distributor. Spending time on the core is more important than pushing out features to feed marketing spins.

@danielwwf
Copy link

Whoever is leading the team at Ripple for merging should explain why core fixes don’t go to the top of the heap. If this continues, I don’t think Ripple is an appropriate package distributor. Spending time on the core is more important than pushing out features to feed marketing spins.

This. This is by far the biggest misconduct by Ripple and, yet again, leads to a ton of questions about how committed Ripple is in regards to the XRPL. Why wasn't this addressed straight up?

@interc0der
Copy link
Collaborator

Yet again, usual suspects @alloynetworks @danielwwf taking this opportunity to muddy the water.

Please, stop the nonsense and let's get back to business. It is the weekend afterall. Go outside and enjoy the sun!

@WietseWind
Copy link
Member

WietseWind commented Sep 14, 2024

@bachase Your point 2 isn't entirely fair: that advisory/doc is about changing the old page size from 1024 to 4096.

Please note this issue is at the new page size of 4096 which is now full!

@interc0der Some of us wanted to be outside in the sun today, but because we weren't non-ripple operated full history is still alive now.

@bachase
Copy link
Collaborator

bachase commented Sep 14, 2024

I take responsibility for Ripple not having released a hotfix.

Based on the original discussion in the reported issue:

  • It seems urgent to remedy this, or risk all full history servers crashing over the next few months.

  • We're VERY close indeed, to all FHs crashing. Most FHs will be 26 million pages from disaster.

we agreed this was important, but there was sufficient time to include this fix in the next major release. There is also the existing method of changing the page size. Again, Ripple operated full-history servers had done this change to an even larger size in 2018 when we published the method and have more than sufficient headroom (see below). But as @WietseWind notes, that doc was not updated to reflect that larger page size.

sqlite3 ./transaction.db 'PRAGMA page_count;'
117866371
sqlite3 ./transaction.db 'PRAGMA page_size;'
65536

All that being said, I agree -- this was not a good way for things to end up!! We should've asked for more details on the page count/growth for folks in this thread to better understand the urgency and acted accordingly.

@danielwwf
Copy link

Yet again, usual suspects @alloynetworks @danielwwf taking this opportunity to muddy the water.

Please, stop the nonsense and let's get back to business. It is the weekend afterall. Go outside and enjoy the sun!

There would have been no chance for me to "muddy the water" if others would have done their job 20 days ago.

But sure, blame everyone else.

@dangell7
Copy link
Collaborator

dangell7 commented Sep 15, 2024

Below are some timing tests for sqlite3 using different page_size and max_page_count. As we can see the same settings should not be applied to validators as FH nodes. If nodes are not storing TB's of data then the page_size and max_page_count should better reflect the size of the db that the node is operating under.

A db with page_size 1024 and max_page_count of 2147483646 is 2TB. The difference between these settings and 4096/4294967294 is ~39% on writes.

It is my recommendation that we DO NOT make the default settings for FH nodes but for nodes/validators and then make the settings configurable for FH nodes.

page_size: 1024
max_page_count: 2147483646
Write duration: 0.151919 seconds
Read duration: 0.006567 seconds
page_size: 1024
max_page_count: 4294967294
Write duration: 0.15229 seconds
Read duration: 0.00763183 seconds
page_size: 4096
max_page_count: 2147483646
Write duration: 0.209158 seconds
Read duration: 0.0065475 seconds
page_size: 4096
max_page_count: 4294967294
Write duration: 0.22716 seconds
Read duration: 0.00774271 seconds

@briannezhad
Copy link

To solve this issue, several approaches can be considered:

1. Upgrade SQLite Version

  • Ensure that the SQLite version on your system is up to date, especially if you're running an older version like 3.26.0. Newer versions (3.38.5 or later) are known to handle the increased max_page_count values. Use the latest SQLite version on your Rocky Linux system, as the issue seems to not occur on newer SQLite versions such as those on Ubuntu 22.04.

  • To upgrade SQLite on Rocky Linux:

    sudo dnf install sqlite

    If the default repositories don’t have the latest version, you may need to manually install it from source or find an updated package from a reliable repository.

2. Change max_page_count PRAGMA

  • If you're already on the latest SQLite version, you could try manually setting the max_page_count to the maximum possible value (4294967294) through a PRAGMA statement, which is larger than the default limit. Use the following SQL command:

    sqlite3 transaction.db 'PRAGMA max_page_count = 4294967294;'
  • Check if the transaction.db respects the new page count. You can do this by querying it with:

    sqlite3 transaction.db 'PRAGMA max_page_count;'

3. Recreate the Database with a New page_size

  • If increasing the max_page_count does not work, you may need to recreate the database with a larger page_size. This is not ideal due to the time and space requirements, but if it's necessary, the steps are as follows:
    1. Export the database to SQL:
      sqlite3 transaction.db .dump > transaction.sql
    2. Create a new database with the desired page_size:
      sqlite3 new_transaction.db 'PRAGMA page_size = 8192;'
    3. Import the dumped SQL into the new database:
      sqlite3 new_transaction.db < transaction.sql

4. VACUUM Command

  • Although vacuuming the database may not seem useful for a full history server, it might be worth attempting to see if it can help consolidate free space.
    sqlite3 transaction.db 'VACUUM;'
  • This can free space by reorganizing data, though it won’t reduce page usage if there’s no deleted data.

5. Run rippled with Optimized Settings

  • Ensure that rippled is configured properly to handle the growing database size. Monitor rippled log files and adjust parameters related to database management to minimize disk space issues.

6. Consider Storage Solutions

  • If the database is growing too large for the current disk space, you might need to consider alternative storage solutions (like increasing the server’s disk size) or archiving older data.

By combining these solutions, you should be able to resolve the issue with the transaction.db file growing too large. Upgrading SQLite and increasing the max_page_count should be prioritized, as they directly address the root of the issue.

@WietseWind
Copy link
Member

WietseWind commented Sep 16, 2024

@briannezhad: @bachase informed me Ripple is running at page_size 65536, any specific reason for Ripple to run at a different value than the 8192 proposed above?

Also: no need for an intermediate dump file, which would result in requiring three times the space of the original database (now 8TB, so: 8 (original) + dump + import = roughly 24 TB required, probably even more because the dump will most likely be larger than the database, as blob info is smaller in db than in ASCII). You can pipe straight into the new DB resulting in requiring only two times the space (16 TB)

sqlite3 new_transaction.db 'PRAGMA page_size = 8192;'
sqlite3 transaction.db .dump | sqlite3 new_transaction.db

@briannezhad
Copy link

@WietseWind, Ripple uses a page_size of 65536 to handle large databases more efficiently most likely. The larger page size helps reduce I/O operations, which is important for the 8TB database. While 8192 is standard, the larger size offers better performance for Ripple.

You're right about piping directly into the new DB—it saves space and avoids the need for an intermediate dump file.

@bachase
Copy link
Collaborator

bachase commented Sep 16, 2024

I don't (yet) have any more details on how Ripple selected the 64k size versus 8k, other than I suspect at time we decided to go for larger size and didn't see any impact to performance overall.

@dangell7 appreciate your tests and also the guidance that FH server configs would differ than other use cases. Can you share what type of workloads you used in that performance test?

@jscottbranson
Copy link
Contributor Author

@WietseWind have you successfully used the vacuum into statement to increase page size? Even without the intermediate DB, I had a vacuum into operation fail with 24 TB of disk space.

@WietseWind
Copy link
Member

@jscottbranson I did not get vacuum to work, which in hindsight may be related to:
#5095

I created a new SQLite3 db and set page size to 65536 and then piped the .dump on the old DB into the new one but on the fastest machine with crazy amounts of IO and bandwidth it caps out at ~20MB/s, so it's going to last weeks for the new DB to be populated. We didn't have weeks, do the source change was the only workaround for now.

@MarkusTeufelberger
Copy link
Collaborator

Should have binary history packs (delta logs) and postgresql by now??

Shards could have sold that issue. They were just recently removed alltogether as a feature.

@bachase
Copy link
Collaborator

bachase commented Sep 17, 2024

I suggest we open a new issue (happy to do so a little later) for folks to discuss improving support for FH operators, including this cold start problem for acquiring data or dealing with resizing. Clearly there are plenty of improvements to be had here, and it would be good to understand (myself included) the gaps in shards/clio as options for meeting these needs so we can find the right approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.