Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid storage caching in benchmarks #1469

Merged
merged 2 commits into from
Nov 2, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 6 additions & 10 deletions benches/benches/vm.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@ use criterion::{

use contract::*;
use fuel_core_benches::*;
use fuel_core_storage::transactional::Transaction;
use fuel_core_types::fuel_asm::Instruction;
use vm_set::*;

Expand All @@ -33,14 +32,12 @@ where
instruction,
diff,
} = &mut i;
let original_db = vm.as_mut().database_mut().clone();
let mut db_txn = {
let db = vm.as_mut().database_mut();
let db_txn = db.transaction();
// update vm database in-place to use transaction
*db = db_txn.as_ref().clone();
db_txn
};
let checkpoint = vm
.as_mut()
.database_mut()
.checkpoint()
.expect("Should be able to create a checkpoint");
let original_db = core::mem::replace(vm.as_mut().database_mut(), checkpoint);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really representative of real world conditions though? Since we'll pretty much always be executing this opcode from within a db transaction when going through the block executor.

Copy link
Collaborator Author

@xgreenx xgreenx Nov 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

The idea is to avoid optimizations from MemoryTransactionView in the case of non-first interaction with the storage key. Using checkpoint instead of MemoryTransactionView, we go to self.data_source.get(key, column) branch directly.

But yeah, it is not real-world execution. Because inside of the block, we have RocksDB -> MemoryTransactionView -> MemoryTransactionView. We can create a special BenchmarkMemoryTransactionView for benchmarks that does all the same thing, except it always goes to data_source. But I think the overhead from MemoryTransactionView is much lesser than direct access to the checkpoint database. The proof is a significant increase in the execution time of the opcode for case with 1.

image


let final_time;
loop {
Expand Down Expand Up @@ -80,7 +77,6 @@ where
}
}

db_txn.commit().unwrap();
// restore original db
*vm.as_mut().database_mut() = original_db;
final_time
Expand Down
39 changes: 7 additions & 32 deletions benches/benches/vm_set/blockchain.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
use std::{
env,
iter::successors,
path::PathBuf,
sync::Arc,
};

Expand All @@ -13,7 +11,10 @@ use criterion::{
};
use fuel_core::{
database::vm_database::VmDatabase,
state::rocks_db::RocksDb,
state::rocks_db::{
RocksDb,
ShallowTempDir,
},
};
use fuel_core_benches::*;
use fuel_core_types::{
Expand All @@ -37,27 +38,6 @@ use rand::{
SeedableRng,
};

/// Reimplementation of `tempdir::TempDir` that allows creating a new
/// instance without actually creating a new directory on the filesystem.
/// This is needed since rocksdb requires empty directory for checkpoints.
pub struct ShallowTempDir {
path: PathBuf,
}
impl ShallowTempDir {
pub fn new() -> Self {
let mut rng = rand::thread_rng();
let mut path = env::temp_dir();
path.push(format!("fuel-core-bench-rocksdb-{}", rng.next_u64()));
Self { path }
}
}
impl Drop for ShallowTempDir {
fn drop(&mut self) {
// Ignore errors
let _ = std::fs::remove_dir_all(&self.path);
}
}

pub struct BenchDb {
db: RocksDb,
/// Used for RAII cleanup. Contents of this directory are deleted on drop.
Expand All @@ -70,7 +50,7 @@ impl BenchDb {
fn new(contract: &ContractId) -> anyhow::Result<Self> {
let tmp_dir = ShallowTempDir::new();

let db = Arc::new(RocksDb::default_open(&tmp_dir.path, None).unwrap());
let db = Arc::new(RocksDb::default_open(tmp_dir.path(), None).unwrap());

let mut database = Database::new(db.clone());
database.init_contract_state(
Expand Down Expand Up @@ -106,14 +86,9 @@ impl BenchDb {

/// Create a new separate database instance using a rocksdb checkpoint
fn checkpoint(&self) -> VmDatabase {
let tmp_dir = ShallowTempDir::new();
self.db
.checkpoint(&tmp_dir.path)
use fuel_core::state::TransactableStorage;
let database = TransactableStorage::checkpoint(&self.db)
.expect("Unable to create checkpoint");
let db = RocksDb::default_open(&tmp_dir.path, None).unwrap();
let database = Database::new(Arc::new(db)).with_drop(Box::new(move || {
drop(tmp_dir);
}));
VmDatabase::default_from_database(database)
}
}
Expand Down
24 changes: 11 additions & 13 deletions benches/src/bin/collect.rs
Original file line number Diff line number Diff line change
Expand Up @@ -633,6 +633,7 @@ fn linear_regression(x_y: Vec<(u64, u64)>) -> f64 {
fn dependent_cost(name: &String, x_y: Vec<(u64, u64)>) -> (u64, u64) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x_y isn't a very helpful name. Would points work here? I know we have have a Point type below, but I think it would still be an improvement.

If the types are getting confusing you could even set the type as impl Into<Point> and create a From<(u64, u64)> :P. That's less important than the name.

const NEAR_LINEAR: f64 = 0.1;

#[derive(PartialEq, Eq)]
enum Type {
/// The points have a linear property. The first point
/// and the last points are almost the same(The difference is < 0.1).
Expand Down Expand Up @@ -671,7 +672,7 @@ fn dependent_cost(name: &String, x_y: Vec<(u64, u64)>) -> (u64, u64) {

let linear_regression = linear_regression(x_y.clone());

let mut x_y = x_y
let x_y = x_y
.into_iter()
.map(|(x, y)| Point { x, y })
.collect::<Vec<_>>();
Expand Down Expand Up @@ -704,7 +705,15 @@ fn dependent_cost(name: &String, x_y: Vec<(u64, u64)>) -> (u64, u64) {
.unwrap();
(base, amount as u64)
}
Type::Logarithm => {
Type::Logarithm | Type::Exp => {
if expected_type == Type::Exp {
println!(
"The {} is exponential. We don't support exponential chart. \
The opcode should be limited with upper bound. \n {:?}",
name, x_y
);
}

// The logarithm function slows down fast, and the point where it becomes more
// linear is the base point. After this point we use linear strategy.
let last = x_y.last().unwrap().amount();
Expand All @@ -726,17 +735,6 @@ fn dependent_cost(name: &String, x_y: Vec<(u64, u64)>) -> (u64, u64) {
.unwrap_or(last);
(base.y, amount as u64)
}
Type::Exp => {
println!(
"The {} is exponential. We don't support exponential chart. \
The opcode should be limited with upper bound. \n {:?}",
name, x_y
);

x_y.sort_unstable_by_key(|a| a.amount() as u64);
let base = x_y.last().unwrap();
(base.y, 0)
}
}
}

Expand Down
4 changes: 4 additions & 0 deletions crates/fuel-core/src/database.rs
Original file line number Diff line number Diff line change
Expand Up @@ -233,6 +233,10 @@ impl Database {
self.into()
}

pub fn checkpoint(&self) -> DatabaseResult<Self> {
self.data.checkpoint()
}

pub fn flush(self) -> DatabaseResult<()> {
self.data.flush()
}
Expand Down
8 changes: 8 additions & 0 deletions crates/fuel-core/src/state.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
use crate::database::{
Column,
Database,
Error as DatabaseError,
Result as DatabaseResult,
};
use fuel_core_storage::iter::{
Expand Down Expand Up @@ -87,6 +89,12 @@ pub enum WriteOperation {
}

pub trait TransactableStorage: BatchOperations + Debug + Send + Sync {
fn checkpoint(&self) -> DatabaseResult<Database> {
Err(DatabaseError::Other(anyhow::anyhow!(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to add an error variant for Unsupported or whatever. I think we should get out of the habit of using anyhow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends on the situation, in the case of checkpoints it's only used in benchmarking / test contexts and we don't do any special logic based on the error variant other than bailing from the benchmark.

https://www.lpalmieri.com/posts/error-handling-rust/#anyhow-or-thiserror

"Checkpoint is not supported"
)))
}

fn flush(&self) -> DatabaseResult<()>;
}

Expand Down
116 changes: 103 additions & 13 deletions crates/fuel-core/src/state/rocks_db.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ use crate::{
database::{
convert_to_rocksdb_direction,
Column,
Database,
Error as DatabaseError,
Result as DatabaseResult,
},
Expand All @@ -20,8 +21,10 @@ use fuel_core_storage::iter::{
BoxedIter,
IntoBoxedIter,
};
use rand::RngCore;
use rocksdb::{
checkpoint::Checkpoint,
BlockBasedOptions,
BoundColumnFamily,
Cache,
ColumnFamilyDescriptor,
Expand All @@ -35,15 +38,56 @@ use rocksdb::{
WriteBatch,
};
use std::{
env,
iter,
path::Path,
path::{
Path,
PathBuf,
},
sync::Arc,
};

type DB = DBWithThreadMode<MultiThreaded>;

/// Reimplementation of `tempdir::TempDir` that allows creating a new
/// instance without actually creating a new directory on the filesystem.
/// This is needed since rocksdb requires empty directory for checkpoints.
pub struct ShallowTempDir {
path: PathBuf,
}

impl Default for ShallowTempDir {
fn default() -> Self {
Self::new()
}
}

impl ShallowTempDir {
/// Creates a random directory.
pub fn new() -> Self {
let mut rng = rand::thread_rng();
let mut path = env::temp_dir();
path.push(format!("fuel-core-shallow-{}", rng.next_u64()));
Self { path }
}

/// Returns the path of teh directory.
pub fn path(&self) -> &PathBuf {
&self.path
}
}

impl Drop for ShallowTempDir {
fn drop(&mut self) {
// Ignore errors
let _ = std::fs::remove_dir_all(&self.path);
}
}

#[derive(Debug)]
pub struct RocksDb {
db: DB,
capacity: Option<usize>,
}

impl RocksDb {
Expand All @@ -63,16 +107,43 @@ impl RocksDb {
columns: Vec<Column>,
capacity: Option<usize>,
) -> DatabaseResult<RocksDb> {
let cf_descriptors = columns
.clone()
.into_iter()
.map(|i| ColumnFamilyDescriptor::new(RocksDb::col_name(i), Self::cf_opts(i)));
let mut block_opts = BlockBasedOptions::default();
// See https://github.com/facebook/rocksdb/blob/a1523efcdf2f0e8133b9a9f6e170a0dad49f928f/include/rocksdb/table.h#L246-L271 for details on what the format versions are/do.
block_opts.set_format_version(5);

if let Some(capacity) = capacity {
// Set cache size 1/3 of the capacity as recommended by
// https://github.com/facebook/rocksdb/wiki/Setup-Options-and-Basic-Tuning#block-cache-size
let block_cache_size = capacity / 3;
let cache = Cache::new_lru_cache(block_cache_size);
block_opts.set_block_cache(&cache);
// "index and filter blocks will be stored in block cache, together with all other data blocks."
// See: https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB#indexes-and-filter-blocks
block_opts.set_cache_index_and_filter_blocks(true);
// Don't evict L0 filter/index blocks from the cache
block_opts.set_pin_l0_filter_and_index_blocks_in_cache(true);
} else {
block_opts.disable_cache();
}
block_opts.set_bloom_filter(10.0, true);

let cf_descriptors = columns.clone().into_iter().map(|i| {
ColumnFamilyDescriptor::new(
RocksDb::col_name(i),
Self::cf_opts(i, &block_opts),
)
});

let mut opts = Options::default();
opts.create_if_missing(true);
opts.set_compression_type(DBCompressionType::Lz4);
if let Some(capacity) = capacity {
let cache = Cache::new_lru_cache(capacity);
// Set cache size 1/3 of the capacity. Another 1/3 is
// used by block cache and the last 1 / 3 remains for other purposes:
//
// https://github.com/facebook/rocksdb/wiki/Setup-Options-and-Basic-Tuning#block-cache-size
let row_cache_size = capacity / 3;
let cache = Cache::new_lru_cache(row_cache_size);
opts.set_row_cache(&cache);
}

Expand All @@ -82,7 +153,7 @@ impl RocksDb {
match DB::open_cf(&opts, &path, &[] as &[&str]) {
Ok(db) => {
for i in columns {
let opts = Self::cf_opts(i);
let opts = Self::cf_opts(i, &block_opts);
db.create_cf(RocksDb::col_name(i), &opts)
.map_err(|e| DatabaseError::Other(e.into()))?;
}
Expand All @@ -96,7 +167,7 @@ impl RocksDb {
let cf_descriptors = columns.clone().into_iter().map(|i| {
ColumnFamilyDescriptor::new(
RocksDb::col_name(i),
Self::cf_opts(i),
Self::cf_opts(i, &block_opts),
)
});
DB::open_cf_descriptors(&opts, &path, cf_descriptors)
Expand All @@ -106,12 +177,19 @@ impl RocksDb {
ok => ok,
}
.map_err(|e| DatabaseError::Other(e.into()))?;
let rocks_db = RocksDb { db };
let rocks_db = RocksDb { db, capacity };
Ok(rocks_db)
}

pub fn checkpoint<P: AsRef<Path>>(&self, path: P) -> Result<(), rocksdb::Error> {
Checkpoint::new(&self.db)?.create_checkpoint(path)
pub fn checkpoint<P: AsRef<Path>>(&self, path: P) -> DatabaseResult<()> {
Checkpoint::new(&self.db)
.and_then(|checkpoint| checkpoint.create_checkpoint(path))
.map_err(|e| {
DatabaseError::Other(anyhow::anyhow!(
"Failed to create a checkpoint: {}",
e
))
})
}

fn cf(&self, column: Column) -> Arc<BoundColumnFamily> {
Expand All @@ -121,13 +199,14 @@ impl RocksDb {
}

fn col_name(column: Column) -> String {
format!("column-{}", column.as_usize())
format!("col-{}", column.as_usize())
}

fn cf_opts(column: Column) -> Options {
fn cf_opts(column: Column, block_opts: &BlockBasedOptions) -> Options {
let mut opts = Options::default();
opts.create_if_missing(true);
opts.set_compression_type(DBCompressionType::Lz4);
opts.set_block_based_table_factory(block_opts);

// All double-keys should be configured here
match column {
Expand Down Expand Up @@ -402,6 +481,17 @@ impl BatchOperations for RocksDb {
}

impl TransactableStorage for RocksDb {
fn checkpoint(&self) -> DatabaseResult<Database> {
let tmp_dir = ShallowTempDir::new();
self.checkpoint(&tmp_dir.path)?;
let db = RocksDb::default_open(&tmp_dir.path, self.capacity)?;
let database = Database::new(Arc::new(db)).with_drop(Box::new(move || {
drop(tmp_dir);
}));

Ok(database)
}

fn flush(&self) -> DatabaseResult<()> {
self.db
.flush_wal(true)
Expand Down