Skip to content

Latest commit

 

History

History
158 lines (135 loc) · 7.61 KB

README.md

File metadata and controls

158 lines (135 loc) · 7.61 KB

Mini DBMS in Rust

Rust

An attempt to write database management system (DBMS) in Rust to learn about database internals.

We started by implementing it following the Let's Build a Simple Database tutorial (which is based on SQLite). However, as we continue the work after the tutorial, we kind of diverted from following the SQLite implementation and specification. Hence, it's not entirely correct to call this SQLite in Rust anymore.

The rest of the work is continue based on multiple references. One of the most important one is CMU Intro to Database Systems projects (including previous years archive, since each year they have students implement different structure).

While our B+ tree now support concurrent operations, it's still a single threaded database system, as our frontend (network/cli layer) doesn't support handling concurrent requests yet.

This is by no mean an idiomatic Rust implementation as I'm learning Rust along the way.


Update (25/09/2022): The project is kind of on hold as I just started a new job and still adapting to the new schedule I have. Hence, the progress will be slower. Hopefully, I can regain my momentum after a couple of weeks.


End Goal

The main focus to write a storage engine from scratch. This project now includes it's own B+ Tree data structure, buffer pool, LRU replacement policy, transaction manager, and lock manager.

What's next is to implement the recovery system by implement a log manager for our Write Ahead Log (WAL) and ARIES protocol.

Non Goals

We won't go deep into the query engine. Currently, we have two variance of the query engine, one is written during the early phase where it just parse string and execute functions directly. The newer implementation includes a simplify query engine with query plan and query executor.

We won't write our SQL query parser, query rewriter and optimizer for the time being.

We would just write a simple enough query engine so we can test out our implementation.

Progress

Since we have completed through the tutorial of Let's Build a Simple Database, we will have to come up with our own checklist for this project. Here's a quick breakdown of my journey and what I'm going to implement next:

  • Add test case for splitting node and updating parent, where the new node is not the most right child.
  • Implement split on internal node.
  • Extend test cases to make sure insertion is working as intended and fix bugs along the way...
    • Add property-based testing for insertion test.
    • Fix all the bugs found through property based test.
  • Implement find by id operation. This would be helpful when testing deletion.
  • Implement delete operation for sqlite.
  • Implement deletion for B+ Tree.
    • Implement deletion on leaf node.
    • Implement deletion on internal node.
    • Implement merging after the neighbouring nodes pointers/key-values < N.
    • Implement mergeing for internal nodes.
    • Implement rebalancing to reduce the number of merges need.
      • Implement steal from siblings if can't merge both nodes together.
      • Check if there's still other rebalancing optimisation technique available...
    • Replace hardcoded max internal node count of 3 with the actual internal node count supported by our data format.
      • This require us to generate a larger datasets to tests the behaviour.
  • Implement buffer pool for our database. (Reference)
    • Implement least recently used (LRU) replacement policies.
    • Implement Buffer Pool Manager.
    • Integrated Buffer Pool manager into the rest of the system.
    • Make buffer pool page size configurable.
  • Multi threaded index concurrency control. (Reference, Reference, see Task 4)
    • Support concurrent insert to B+ Tree.
    • Support concurrent select to B+ Tree.
    • Support concurrent delete to B+ Tree.
    • Test concurrent insert + select;
    • Test concurrent delete + select;
    • Test concurrent insert + delete;
    • Test concurrent insert + select + delete;
    • Optimize latch crabbing by holding read lock and only swap to write lock when there's a split/merge.
    • Refactor tree operation out of Pager.
  • Implement concurrency control at row/tuple level. (Reference)
    • Implement a transaction manager first.
    • Implement lock manager.
      • It currently doesn't support different locking behaviour based on different isolation levels.
      • While is completed, I can't really guarantee the correctness of the implementation for the time being. See the comments in code for more.
      • Testing for read and write anomalies are not included yet. This will require implementation of update operation and integration of lock manager into the query/table code, which is what I plan to do next.
    • Implement a query executor.
      • Implemented both sequence scan and delete executor and plan node.
      • It is not integrated into the other part of the systems yet.
    • Support update operation. This is important as it allow us to produce test case that can lead to read/write anomalies.
      • Implement update plan node.
      • Implement update executor.
      • Support usage of index scan in update plan node.
    • Write test to ensure that two phase locking works on all read and write anomolies.
      • Update query executor, table to ensure lock is acquired correctly so the tests for read/write anomolies passed.
      • Currently, the test is only available for index scan executor. To ensure, we can test the read/write anomolies with sequence scan, we need to first support where expression evaluation in our query engine. Which would be a task for another day.
    • Implement concurrent query execution.
    • Implement dead lock detection.
    • Implement dead lock prevention. (Wound Wait algorithm) (Reference)
  • Implement recovery mechanism for our database. (Reference)
    • Implement basic log manager and log record.
    • Update page struct to contain additional information required for recovery. E.g. LSN.
    • Implement WAL in other subsystems, such as buffer pool (pager) and transaction manager.
    • Implement ARIES.
    • Implement a stop the world checkpointing.
  • Update our query parser to integrate with our new query executor. This will allow us to easily test things by using SQL statement instead of manually writing our query plan.
    • Implement insert executor.
    • Parse query into query plan.
    • Replace the query engine with the new onw in main.rs.

(subject to changes as we progress)

References