- Case Study and Justification
- Users Table Schema
- Quizzes Table Schema
- Responses Table Schema
- Schema Migration and Setup
A NoSQL database is ideal for this application for the following reasons:
- Eventual consistency is adequate for everything except the
User
table. - Reads are more frequent than writes.
- Data access patterns do not require table joins.
- Data de-normalization is minimal or non-existent.
- Foreign key integrity in the tables is maintained by the service logic.
- The
Quiz
table updates are only conducted by a single user. Once a record is marked as published it becomes read-only with updates to mark it as deleted. - The
Responses
table is read and write only with no updates permitted. - The
User
table is read and write only with the only updates allowed to delete an account or update fields that are not the primary key (username
).
As such, Apache Cassandra fits these requirements perfectly:
- Supports querying a table data through the
CQL
language. - Consistency is tunable at the query level. A
Local Quorum
based consistency is adequate for most tables whereas theUser
table can be set toQuorum
in the queries that access the table. - Multi-master architecture that provides high availability with minimal downtime.
- Architecture also provides excellent performance for both read and write operations as it can coordinate multiple writes at a time.
This struct
embeds the UserAccount
struct
to create a representation of the user account table.
Name (Struct) | Data Type (Struct) | Column Name | Column Type | Description |
---|---|---|---|---|
Account_ID | string | account_id | text | The account id ia a unique identifier generated by the Blake2b algorithm and is a 256bit hash of the username . Partition Key and Clustering Key. |
Username | string | username | text | The username unique identifier. Partition Key and Clustering Key. |
Password | string | password | text | User's hashed password. |
FirstName | string | first_name | text | User's first name. |
LastName | string | last_name | text | User's last name. |
string | text | Email address. | ||
IsDeleted | bool | is_deleted | boolean | Indicator as to whether the account is deleted. Prevents username reassignment. |
The Primary/Partition Key (username
) might result in a hot partition. To potentially alleviate this issue the column
account_id
has been created and is using the Blake2b algorithm to generate a 256bit hash of username
.
This will allow the creation of a Compound Primary/Partition Key (username
, account_id
). The username
will be provided
as input to the service logic and will generate the consistent hash to use in the queries.
This struct is created to be exposed for use with the HTTP handlers. This ensures consistency with the User
struct
.
It contains the Username
, Password
, FirstName
, LastName
, and Email
fields.
The query to generate the user table can be found here.
Describes a single question with all it's answer options as well as the answer key for the question.
Name (Struct) | Data Type (Struct) | Column Name | Column Type | Description |
---|---|---|---|---|
Description | string | description | text | The description that contains the text of the question. |
Asset | string | asset | text | URI of an asset to be displayed with question. |
Options | [ ] string | options | list | The available options for the question. |
Answers | [ ] int | answers | list | The indices of the options that are correct answers in the question. |
This struct
embeds the QuizCore
struct
to create a representation of the quizzes table.
Name (Struct) | Data Type (Struct) | Column Name | Column Type | Description |
---|---|---|---|---|
Quiz_ID | gocql.UUID | quiz_id | uuid | Account id unique identifier. Partition Key. |
Author | string | author | text | Username of the quiz creator. |
Title | string | title | text | Description of the quiz. |
Marking Type | string | marking_type | text | The marking scheme type: [N\n]one [N\n]egative [N\n]on-negative [B\b]inary |
Questions | [ ] Question | questions | frozen<list<frozen>> | A list of question UDTs in the quiz. |
IsPublished | bool | is_published | boolean | Status indicating whether the quiz can be viewed or taken by other users. |
IsDeleted | bool | is_deleted | boolean | Status indicating whether the quiz has been deleted. |
Since the Primary/Partition Key (quiz_id
) is a UUID
, it should help distribute the records evenly across the cluster
nodes. Quizzes are requested by their unique quiz_id
's.
This struct is created to be exposed for use with the HTTP handlers. This ensures consistency with the Quiz
struct
.
It contains the Title
, MarkingType
, and Question
fields and is the actual data used to create a Quiz
as well as
what is presented when viewing a quiz.
The query to generate the user table can be found here.
This struct
creates a representation of the responses table.
Name (Struct) | Data Type (Struct) | Column Name | Column Type | Description |
---|---|---|---|---|
Username | string | username | text | Username of the test taker. Compound Partition Key. |
QuizID | gocql.UUID | quiz_id | uuid | Taken quiz's id. Compound Partition Key. |
Author | string | author | text | Taken quiz author's username. |
Score | float64 | score | double | Score for this submission. Clustering Key. |
Responses | QuizResponse | responses | frozen<list<list>>, | Recorded responses for the submission. |
It would not be an arbitrary assumption that some quizzes will be more popular than others, leading to a hot partition. The
Compound Primary/Partition Key (username
, quiz_id
) should be unique enough to help distribute the records evenly
across the cluster nodes. The rationale behind the selection of this key is there will be more users taking the
quizzes than users who are also authors and requesting statistics of their quizzes. A Clustering Index of score
will
sort the records on each node by score
.
A secondary index will be constructed on just the quiz_id
column. This is an ideal candidate for a secondary index because
of its perceived high cardinality. When a quiz author requests statistics for their published quiz by its quiz_id
,
this index will be used to retrieve all the required records.
The quiz author
's username has been added to the table to facilitate permission checking for statistics retrieval via an
HTTP request.
The query to generate the responses table can be found here.
For security reasons, there are no database schema migration tools provided through the binary. This is to avoid deploying a payload in a production container that could potentially modify the databases' schema. As an alternative, there are three CQL files provided that will need to be deployed either manually or through database migration tooling. The files will need to be deployed in the following order:
This can be achieved manually or through tooling such as Liquibase which is the industry standard for database migrations.
For convenience, a Liquibase database schema migration change set has been provided with stages for rollbacks.
Please ensure you have the following installed in the lib
directory of your Liquibase installation:
The properties file with the login credentials for the Cassandra cluster will need to be updated appropriately.
changeLogFile: cassandra_migration.sql
url: jdbc:cassandra://localhost:9042/mcq_platform;DefaultKeyspace=mcq_platform;AuthMech=1
username: admin
password: root
driver: com.simba.cassandra.jdbc42.Driver
defaultSchemaName: mcq_platform
The following key space will need to be manually created in your Cassandra cluster before the migration can be executed:
CREATE KEYSPACE IF NOT EXISTS mcq_platform WITH replication = {'class' : 'SimpleStrategy', 'replication_factor' : 3};
If you are running a single-node cluster for testing you will need to set the replication_factor
to 1
.
At this point a test should be run to ensure Liquibase is able to connect to the database:
liquibase status
If a connection could be established, you should be ready to execute a migration:
liquibase update
Verification of the changes can be achieved through:
liquibase history