Skip to content

Latest commit

 

History

History

cassandra

Apache Cassandra

Table of contents


Case Study and Justification

A NoSQL database is ideal for this application for the following reasons:

  • Eventual consistency is adequate for everything except the User table.
  • Reads are more frequent than writes.
  • Data access patterns do not require table joins.
  • Data de-normalization is minimal or non-existent.
  • Foreign key integrity in the tables is maintained by the service logic.
  • The Quiz table updates are only conducted by a single user. Once a record is marked as published it becomes read-only with updates to mark it as deleted.
  • The Responses table is read and write only with no updates permitted.
  • The User table is read and write only with the only updates allowed to delete an account or update fields that are not the primary key (username).

As such, Apache Cassandra fits these requirements perfectly:

  • Supports querying a table data through the CQL language.
  • Consistency is tunable at the query level. A Local Quorum based consistency is adequate for most tables whereas the User table can be set to Quorum in the queries that access the table.
  • Multi-master architecture that provides high availability with minimal downtime.
  • Architecture also provides excellent performance for both read and write operations as it can coordinate multiple writes at a time.

Users Table Schema

User

This struct embeds the UserAccount struct to create a representation of the user account table.

Name (Struct) Data Type (Struct) Column Name Column Type Description
Account_ID string account_id text The account id ia a unique identifier generated by the Blake2b algorithm and is a 256bit hash of the username. Partition Key and Clustering Key.
Username string username text The username unique identifier. Partition Key and Clustering Key.
Password string password text User's hashed password.
FirstName string first_name text User's first name.
LastName string last_name text User's last name.
Email string email text Email address.
IsDeleted bool is_deleted boolean Indicator as to whether the account is deleted. Prevents username reassignment.

The Primary/Partition Key (username) might result in a hot partition. To potentially alleviate this issue the column account_id has been created and is using the Blake2b algorithm to generate a 256bit hash of username. This will allow the creation of a Compound Primary/Partition Key (username, account_id). The username will be provided as input to the service logic and will generate the consistent hash to use in the queries.

User Account

This struct is created to be exposed for use with the HTTP handlers. This ensures consistency with the User struct. It contains the Username, Password, FirstName, LastName, and Email fields.

CQL Query

The query to generate the user table can be found here.


Quizzes Table Schema

User Defined Type(s)

Question

Describes a single question with all it's answer options as well as the answer key for the question.

Name (Struct) Data Type (Struct) Column Name Column Type Description
Description string description text The description that contains the text of the question.
Asset string asset text URI of an asset to be displayed with question.
Options [ ] string options list The available options for the question.
Answers [ ] int answers list The indices of the options that are correct answers in the question.

Quizzes

This struct embeds the QuizCore struct to create a representation of the quizzes table.

Name (Struct) Data Type (Struct) Column Name Column Type Description
Quiz_ID gocql.UUID quiz_id uuid Account id unique identifier. Partition Key.
Author string author text Username of the quiz creator.
Title string title text Description of the quiz.
Marking Type string marking_type text The marking scheme type: [N\n]one [N\n]egative [N\n]on-negative [B\b]inary
Questions [ ] Question questions frozen<list<frozen>> A list of question UDTs in the quiz.
IsPublished bool is_published boolean Status indicating whether the quiz can be viewed or taken by other users.
IsDeleted bool is_deleted boolean Status indicating whether the quiz has been deleted.

Since the Primary/Partition Key (quiz_id) is a UUID, it should help distribute the records evenly across the cluster nodes. Quizzes are requested by their unique quiz_id's.

Quiz Core

This struct is created to be exposed for use with the HTTP handlers. This ensures consistency with the Quiz struct. It contains the Title, MarkingType, and Question fields and is the actual data used to create a Quiz as well as what is presented when viewing a quiz.

CQL Query

The query to generate the user table can be found here.


Responses Table Schema

Responses

This struct creates a representation of the responses table.

Name (Struct) Data Type (Struct) Column Name Column Type Description
Username string username text Username of the test taker. Compound Partition Key.
QuizID gocql.UUID quiz_id uuid Taken quiz's id. Compound Partition Key.
Author string author text Taken quiz author's username.
Score float64 score double Score for this submission. Clustering Key.
Responses QuizResponse responses frozen<list<list>>, Recorded responses for the submission.

It would not be an arbitrary assumption that some quizzes will be more popular than others, leading to a hot partition. The Compound Primary/Partition Key (username, quiz_id) should be unique enough to help distribute the records evenly across the cluster nodes. The rationale behind the selection of this key is there will be more users taking the quizzes than users who are also authors and requesting statistics of their quizzes. A Clustering Index of score will sort the records on each node by score.

A secondary index will be constructed on just the quiz_id column. This is an ideal candidate for a secondary index because of its perceived high cardinality. When a quiz author requests statistics for their published quiz by its quiz_id, this index will be used to retrieve all the required records.

The quiz author's username has been added to the table to facilitate permission checking for statistics retrieval via an HTTP request.

CQL Query

The query to generate the responses table can be found here.


Schema Migration and Setup

For security reasons, there are no database schema migration tools provided through the binary. This is to avoid deploying a payload in a production container that could potentially modify the databases' schema. As an alternative, there are three CQL files provided that will need to be deployed either manually or through database migration tooling. The files will need to be deployed in the following order:

  1. user.cql
  2. quiz.cql
  3. responses.cql

This can be achieved manually or through tooling such as Liquibase which is the industry standard for database migrations.

Liquibase Migration

For convenience, a Liquibase database schema migration change set has been provided with stages for rollbacks.

Please ensure you have the following installed in the lib directory of your Liquibase installation:

The properties file with the login credentials for the Cassandra cluster will need to be updated appropriately.

changeLogFile: cassandra_migration.sql
url: jdbc:cassandra://localhost:9042/mcq_platform;DefaultKeyspace=mcq_platform;AuthMech=1
username: admin
password: root
driver: com.simba.cassandra.jdbc42.Driver
defaultSchemaName: mcq_platform

The following key space will need to be manually created in your Cassandra cluster before the migration can be executed:

CREATE KEYSPACE IF NOT EXISTS mcq_platform WITH replication = {'class' : 'SimpleStrategy', 'replication_factor' : 3};

If you are running a single-node cluster for testing you will need to set the replication_factor to 1.

At this point a test should be run to ensure Liquibase is able to connect to the database:

liquibase status

If a connection could be established, you should be ready to execute a migration:

liquibase update

Verification of the changes can be achieved through:

liquibase history