Welcome to our tutorial "Implementing a Graph database for a Scooters Business on AWS". Throughout this session, we'll delve into the fascinating realm of Graph Databases and Generative AI oriented to Graphs, and sometimes comparing these technologies with traditional relational systems or RDBMS. Given the widespread use and understanding of RDBMS, we believe comparing these two systems will provide a clearer perspective for those trying to grasp the concepts of graph databases.
By the end of this step-by-step YouTube tutorial, from our YouTube AWS Developers channel, you will:
- Understand the fundamentals of Graph Databases; i.e. main differences between graph and relational DBs.
- Gain insights into the unique advantages and challenges, offered by graph databases.
- Learn about Amazon Neptune service, tailored for graph database deployments.
- Learn how to use Generative AI, to help you in coding and abstracting Gremlin query language with natural language.
- Have your own customisable Graph Data Generator.
- Appreciate scenarios, where graph databases outshine their relational counterparts.
- Get hands-on experience with setting up, loading and querying a graph database on AWS.
- Build most of the tutorial using Infrastructure-as-Code (IaC) Amazon CDK.
- For those with relational databases experience, this exploration will illuminate new possibilities and data solutions. For newcomers, you're about to dive into a dynamic way of visualizing and interpreting data.
π₯ Important: in the project_blog folder, you can also find more-detailed instructions to run all this project, step by step.
- Comparison of technology applicability: "use the right tool for the right job".
- Analysis of Performance and TCO; i.e. Relational Database vs. Graph Database.
- Deploy a Graph Data Generator, completely customizable for any use case to build.
- Understand how to use a Large Language Model, to interrogate Graph database.
- Docker: Install and run Docker locally. This tool uses docker to build image and run containers.
- Minimum disk space of 2 GB for building and deploying docker image
- Install AWS CLI
- Install Python 3.9+
- Install Node.js
- After installing Node.js (
npm
in path), install Amazon CDK; e.g.npm install -g aws-cdk
- Install Visual Studio Code, with Amazon Code Whisperer Plugin
This project is set up like a standard Python project. The initialization
process also creates a virtualenv within this project, stored under the .venv
directory. To create the virtualenv it assumes that there is a python3
(or python
for Windows) executable in your path with access to the venv
package. If for any reason the automatic creation of the virtualenv fails,
you can create the virtualenv manually.
To manually create a virtualenv on MacOS and Linux:
$ python3 -m venv .venv
After the init process completes and the virtualenv is created, you can use the following step to activate your virtualenv.
$ source .venv/bin/activate
If you are a Windows platform, you would activate the virtualenv like this:
% .venv\Scripts\activate.bat
Once the virtualenv is activated, you can install the required dependencies. Optionally use --upgrade
$ pip install -r requirements.txt
Add/change your own environment to the cdk.json file, at the 'context' key. For example, if you want to add your 'Production' environment:
"context": {
"env-production": {
"vpc_neptune": "",
"s3_prefix_scooters_data_loc":"scooters-graph-demo/neptune/data",
"lambda_datagen_num_vehicles":"1000",
"lambda_datagen_num_parts":"10",
"api_gtw_ip_addr_whitelist_list":""
},
Optional CLI: cdk deploy --context <<env-production>>
- vpc_neptune [optional]: if you want to deploy all the assets in your VPC, instead of creating a new one, you can change it here.
- api_gtw_ip_addr_whitelist_list [optional]: list of IPs or CIDR to be whitelisted in the API Gateway; i.e. to query the Neptune graph from a corporate VPN CIDR.
- s3_prefix_scooters_data_loc: to change the path (S3 Key), after the new S3 bucket name.
- lambda_datagen_num_vehicles: number of scooters (graph nodes) to create in the dataset
- lambda_datagen_num_parts: number of parts (graph nodes) to add per scooter.
π‘ Tip: You can move these context options to the Parameter Store in AWS Systems Manager. This service allows you to overwrite the parameter values, keeping an internal versioning record.
You can now synthesize your CDK stacks:
π‘ Tips:
- Remember to have Docker running at this point!
- If you received an error like βapp is required..., it's probably because you are running the command from a subdirectory. Navigate to the main app directory and try again.
If this is your first time using Amazon CDK in this account, we need to bootstrap it first:
$ cdk bootstrap --profile profile-aws-dev-sandbox
Synthesize all your stacks:
$ cdk synth --all --profile profile-aws-dev-sandbox
If the previous steps succeeded, then we can deploy our entire project:
π‘ Tip: this deployment can take more than 15 minutes, especially if it's the first time.
$ cdk deploy --all --profile profile-aws-dev-sandbox
If you don't want to be asked by CDK:
$ cdk deploy --profile profile-aws-dev-sandbox --require-approval never --all
Once you have deployed the Amazon CDK project successfully, you can carry on with the steps provided in the blog post and YouTube video series.
cdk ls
list all stacks in the appcdk synth
emits the synthesized CloudFormation templatecdk deploy
deploy this stack to your default AWS account/regioncdk diff
compare deployed stack with current statecdk docs
open CDK documentation
The graph data model uses Any Python Tree Data, to deploy the Vehicle hierarchy. You can modify this hierarchy graph model via the Lambda function, within the Data Generation stack.
- Via AWS CLI or the AWS console, empty the S3 bucket created by our CDK stack; e.g. s3://scooterss3stack-scootersdemoXXXX/. Otherwise, our CDK Removal Policy will not be able to delete the bucket.
- Run the command below, to delete all resources deployed by our CDK project (architecture image above). This will ask if you want to delete those stacks; enter Y.
$ cdk destroy --all --profile profile-aws-dev-sandbox