-
Notifications
You must be signed in to change notification settings - Fork 6.4k
Compaction
You can read more on Compactions here: Multi-threaded compactions
Here we give overview of the options that impact behavior of Compactions:
-
Options::compaction_style
- RocksDB currently supports two compaction algorithms - Universal style and Level style. This option switches between the two. Can be kCompactionStyleUniversal or kCompactionStyleLevel. If this is kCompactionStyleUniversal, then you can configure universal style parameters withOptions::compaction_options_universal
. -
Options::disable_auto_compactions
- Disable automatic compactions. Manual compactions can still be issued on this database. -
Options::compaction_filter
- Allows an application to modify/delete a key-value during background compaction. The client must provide compaction_filter_factory if it requires a new compaction filter to be used for different compaction processes. Client should specify only one of filter or factory. -
Options::compaction_filter_factory
- a factory that provides compaction filter objects which allow an application to modify/delete a key-value during background compaction.
Other options impacting performance of compactions and when they get triggered are:
-
Options::access_hint_on_compaction_start
- Specify the file access pattern once a compaction is started. It will be applied to all input files of a compaction. Default: NORMAL -
Options::level0_file_num_compaction_trigger
- Number of files to trigger level-0 compaction. A negative value means that level-0 compaction will not be triggered by number of files at all. -
Options::target_file_size_base
andOptions::target_file_size_multiplier
- Target file size for compaction. target_file_size_base is per-file size for level-1. Target file size for level L can be calculated by target_file_size_base * (target_file_size_multiplier ^ (L-1)) For example, if target_file_size_base is 2MB and target_file_size_multiplier is 10, then each file on level-1 will be 2MB, and each file on level 2 will be 20MB, and each file on level-3 will be 200MB. Defaulttarget_file_size_base
is 64MB and defaulttarget_file_size_multiplier
is 1. -
Options::max_compaction_bytes
- Maximum number of bytes in all compacted files. We avoid expanding the lower level file set of a compaction if it would make the total compaction cover more than this amount. -
Options::max_background_compactions
- Maximum number of concurrent background jobs, submitted to the default LOW priority thread pool -
Options::compaction_readahead_size
- If non-zero, we perform bigger reads when doing compaction. If you're running RocksDB on spinning disks, you should set this to at least 2MB. We enforce it to be 2MB if you don't set it with direct I/O.
Compaction can also be manually triggered. See Manual Compaction
You can learn more about all of those options in rocksdb/options.h
See Leveled Compaction.
For description about universal style compaction, see Universal compaction style
If you're using Universal style compaction, there is an object CompactionOptionsUniversal
that hold all the different options for that compaction. The exact definition is in rocksdb/universal_compaction.h
and you can set it in Options::compaction_options_universal
. Here we give short overview of options in CompactionOptionsUniversal
:
-
CompactionOptionsUniversal::size_ratio
- Percentage flexibility while comparing file size. If the candidate file(s) size is 1% smaller than the next file's size, then include next file into this candidate set. Default: 1 -
CompactionOptionsUniversal::min_merge_width
- The minimum number of files in a single compaction run. Default: 2 -
CompactionOptionsUniversal::max_merge_width
- The maximum number of files in a single compaction run. Default: UINT_MAX -
CompactionOptionsUniversal::max_size_amplification_percent
- The size amplification is defined as the amount (in percentage) of additional storage needed to store a single byte of data in the database. For example, a size amplification of 2% means that a database that contains 100 bytes of user-data may occupy upto 102 bytes of physical storage. By this definition, a fully compacted database has a size amplification of 0%. Rocksdb uses the following heuristic to calculate size amplification: it assumes that all files excluding the earliest file contribute to the size amplification. Default: 200, which means that a 100 byte database could require upto 300 bytes of storage. -
CompactionOptionsUniversal::compression_size_percent
- If this option is set to be -1 (the default value), all the output files will follow compression type specified. If this option is not negative, we will try to make sure compressed size is just above this value. In normal cases, at least this percentage of data will be compressed. When we are compacting to a new file, here is the criteria whether it needs to be compressed: assuming here are the list of files sorted by generation time: [ A1...An B1...Bm C1...Ct ], where A1 is the newest and Ct is the oldest, and we are going to compact B1...Bm, we calculate the total size of all the files as total_size, as well as the total size of C1...Ct as total_C, the compaction output file will be compressed iff total_C / total_size < this percentage -
CompactionOptionsUniversal::stop_style
- The algorithm used to stop picking files into a single compaction run. Can be kCompactionStopStyleSimilarSize (pick files of similar size) or kCompactionStopStyleTotalSize (total size of picked files > next file).Default: kCompactionStopStyleTotalSize
Compactions are executed in thread pools. See Thread Pool.
Contents
- RocksDB Wiki
- Overview
- RocksDB FAQ
- Terminology
- Requirements
- Contributors' Guide
- Release Methodology
- RocksDB Users and Use Cases
- RocksDB Public Communication and Information Channels
-
Basic Operations
- Iterator
- Prefix seek
- SeekForPrev
- Tailing Iterator
- Compaction Filter
- Multi Column Family Iterator
- Read-Modify-Write (Merge) Operator
- Column Families
- Creating and Ingesting SST files
- Single Delete
- Low Priority Write
- Time to Live (TTL) Support
- Transactions
- Snapshot
- DeleteRange
- Atomic flush
- Read-only and Secondary instances
- Approximate Size
- User-defined Timestamp
- Wide Columns
- BlobDB
- Online Verification
- Options
- MemTable
- Journal
- Cache
- Write Buffer Manager
- Compaction
- SST File Formats
- IO
- Compression
- Full File Checksum and Checksum Handoff
- Background Error Handling
- Huge Page TLB Support
- Tiered Storage (Experimental)
- Logging and Monitoring
- Known Issues
- Troubleshooting Guide
- Tests
- Tools / Utilities
-
Implementation Details
- Delete Stale Files
- Partitioned Index/Filters
- WritePrepared-Transactions
- WriteUnprepared-Transactions
- How we keep track of live SST files
- How we index SST
- Merge Operator Implementation
- RocksDB Repairer
- Write Batch With Index
- Two Phase Commit
- Iterator's Implementation
- Simulation Cache
- [To Be Deprecated] Persistent Read Cache
- DeleteRange Implementation
- unordered_write
- Extending RocksDB
- RocksJava
- Lua
- Performance
- Projects Being Developed
- Misc