Lab 2: Simulate Taxi Booking, Billing and Payments using Amazon DynamoDB, DynamoDB Streams, AWS Lambda and Amazon Aurora PostgreSQL
- Overview
- Prerequisites
- Setup the AWS Cloud 9 Environment
- Enable Amazon DynamoDB Streams
- Deploying AWS Lambda Function
- Taxi Ride Workflow
In this lab, you will simulate taxi trip booking by a rider and acceptance by a driver followed by billing and payment using Python scripts and SQL commands. You will utilize DynamoDB streams and AWS lambda functions to insert completed trip data from DynamoDB to Aurora PostgreSQL.
- You should have completed Lab1 from the github repository.
- Open the AWS Management Console for AWS Cloud9. You will leverage AWS Cloud9 IDE throughout this lab for running scripts, deploying AWS SAM (Serverless Application Model) templates, executing SQL queries etc.
- Click on Open IDE for the AWS Cloud9 IDE that was created as part of the Amazon CloudFormation template that was deployed.
- Open a terminal window in the AWS Cloud9 IDE by clicking on Window from the menu bar on the top and select New Terminal.
- Update AWS SAM CLI to the latest version by running the following commands in the Cloud9 terminal window.
cd ~/environment
pip install --user --upgrade awscli aws-sam-cli
sam --version
Note: Ensure that the SAM CLI version is 0.21.0 or above.
- To install Boto3 (AWS SDK for Python) copy paste the following commands in the terminal window in the AWS Cloud9 IDE
cd ~/environment
curl -O https://bootstrap.pypa.io/get-pip.py # Get the install script.
sudo python3 get-pip.py # Install pip.
pip3 install boto3 --user
- Open the AWS Management Console for CloudFormation from here.
- In the upper-right corner of the AWS Management Console, confirm you are in the US East (N. Virginia) Region.
- Click on Stacks in the right navigation pane.
- Under Stacks copy the parent CloudFormation stack name starting with mod- (e.g. mod-aa8afde9acf04c7f).
- Substitute the string (substitute-name-of-copied-cf-stack-name) in the command below with the name of the CloudFormation stack and run it in the Cloud9 terminal window.
AWSDBWORKSHOP_CFSTACK_NAME="substitute-name-of-copied-cf-stack-name"
- Copy and paste the following commands in the Cloud9 terminal window to set the environment variable $AWSDBWORKSHOP_CFSTACK_NAME .
echo "export AWSDBWORKSHOP_CFSTACK_NAME=${AWSDBWORKSHOP_CFSTACK_NAME}" >> ~/.bash_profile
. ~/.bash_profile
echo $AWSDBWORKSHOP_CFSTACK_NAME
Note: Ensure that the name of the CloudFormation stack printed in the terminal window (e.g. mod-aa8afde9acf04c7f) matches with the name of your parent CloudFormation stack name.
In this section, you will enable Amazon DynamoDB stream for the Amazon DynamoDB table named 'aws-db-workshop-trips' that was created as part of the CloudFormation stack.
- Copy and paste the commands below in the Cloud9 terminal window to enable streams for the Amazon DynamoDB Tables named 'aws-db-workshop-trips'
STREAM_ID=$(aws dynamodb update-table --table-name aws-db-workshop-trips --stream-specification StreamEnabled=true,StreamViewType=NEW_AND_OLD_IMAGES | jq '.TableDescription.LatestStreamArn' | cut -d'/' -f4)
STREAM_NAME=stream/${STREAM_ID::-1}
echo "export AWSDBWORKSHOP_DDB_STREAM_NAME=${STREAM_NAME}" >> ~/.bash_profile
. ~/.bash_profile
echo $AWSDBWORKSHOP_DDB_STREAM_NAME
Note: The output should be similar to the following
stream/2019-09-18T20:18:33.343
Now that you have enabled the Amazon DynamoDB stream, the next step is to deploy the AWS Lambda function that will process records from the stream.
In this section, you will be using AWS Serverless Application Model (SAM) CLI to deploy a Lambda Function within the same Amazon Virtual Private Network(VPC). The SAM deployment will also include a Python interface to the PostgreSQL database engine as an AWS Lambda Layer. This Lambda function will read the taxi trip information from DynamoDB streams as they are inserted/updated in the DynamoDB table aws-db-workshop-trips. Only when a trip is complete (denoted by the STATUS attribute in the trip item/record), the Lambda function will insert information into trips table in Aurora PostgreSQL.
In this section, you will download and package the binaries for PG8000 - a Python interface to PostgreSQL. The package will be deployed as an AWS Lambda Layer.
- Copy and paste the below commands in the Cloud9 terminal window.
cd ~/environment
mkdir pglayer
virtualenv -p python3 pglayer
cd pglayer
source bin/activate
mkdir -p pg8000-layer/python
pip install pg8000 -t pg8000-layer/python
cd pg8000-layer
zip -r pg8000-layer.zip python
mkdir ~/environment/amazon-rds-purpose-built-workshop/src/ddb-stream-processor/dependencies/
cp ~/environment/pglayer/pg8000-layer/pg8000-layer.zip ~/environment/amazon-rds-purpose-built-workshop/src/ddb-stream-processor/dependencies/
In this section, you will validate the SAM template that contains the configuration for the Lambda function and the Lambda Layer.
- To validate the SAM template, copy and paste the commands below in the terminal window
cd ~/environment/amazon-rds-purpose-built-workshop/src/ddb-stream-processor
sam validate
Note: The terminal window output should display the following:
/home/ec2-user/environment/amazon-rds-purpose-built-workshop/src/ddb-stream-processor/template.yaml is a valid SAM Template
- To package the AWS SAM application, copy and paste the commands below in the Cloud9 terminal window. This will create a template-out.yaml file is the same folder and will upload the packaged binaries to the specified Amazon S3 bucket.
cd ~/environment/amazon-rds-purpose-built-workshop/src/ddb-stream-processor
S3_BUCKETNAME=$(aws cloudformation describe-stacks --stack-name $AWSDBWORKSHOP_CFSTACK_NAME | jq -r '.Stacks[].Outputs[] | select(.OutputKey=="S3bucketName") | .OutputValue')
echo $S3_BUCKETNAME
sam package --output-template-file template-out.yaml --s3-bucket $S3_BUCKETNAME
-
[Optional] Please take some time to review template-out.yaml.
-
Copy and paste the command below to ensure that the packages have been uploaded successfully to the Amazon S3 bucket.
aws s3 ls s3://$S3_BUCKETNAME
Sample Output:
019-09-15 16:39:56 70451 14b63970e9437bf82ea16664d46a929e 2019-09-15 16:39:56 71954 d3eec91527b02d78de30ae42198cd0c0
- Set the variables from the output of the Amazon CloudFormation template that was deployed.
Note: You can copy the entire script block below and paste in the Cloud9 terminal window. Press Enter for the final command to execute.
cd ~/environment/amazon-rds-purpose-built-workshop/src/ddb-stream-processor
AURORADB_NAME=$(aws cloudformation describe-stacks --stack-name $AWSDBWORKSHOP_CFSTACK_NAME | jq -r '.Stacks[].Outputs[] | select(.OutputKey=="AuroraDBName") | .OutputValue')
echo $AURORADB_NAME
AURORACLUSTERENDPOINT_NAME=$(aws cloudformation describe-stacks --stack-name $AWSDBWORKSHOP_CFSTACK_NAME | jq -r '.Stacks[].Outputs[] | select(.OutputKey=="AuroraClusterEndpointName") | .OutputValue')
echo $AURORACLUSTERENDPOINT_NAME
AURORADBMASTERUSER_NAME=$(aws cloudformation describe-stacks --stack-name $AWSDBWORKSHOP_CFSTACK_NAME | jq -r '.Stacks[].Outputs[] | select(.OutputKey=="AuroraDBMasterUser") | .OutputValue')
echo $AURORADBMASTERUSER_NAME
LAMBDASECURITYGROUP_ID=$(aws cloudformation describe-stacks --stack-name $AWSDBWORKSHOP_CFSTACK_NAME | jq -r '.Stacks[].Outputs[] | select(.OutputKey=="LambdaSecurityGroupId") | .OutputValue')
echo $LAMBDASECURITYGROUP_ID
LAMBDASUBNET1_ID=$(aws cloudformation describe-stacks --stack-name $AWSDBWORKSHOP_CFSTACK_NAME | jq -r '.Stacks[].Outputs[] | select(.OutputKey=="LambdaSubnet1") | .OutputValue')
LAMBDASUBNET2_ID=$(aws cloudformation describe-stacks --stack-name $AWSDBWORKSHOP_CFSTACK_NAME | jq -r '.Stacks[].Outputs[] | select(.OutputKey=="LambdaSubnet2") | .OutputValue')
echo $LAMBDASUBNET1_ID,$LAMBDASUBNET2_ID
- Copy and paste the following command to deploy the Lambda Function along with the Lambda Layer.
sam deploy --template-file template-out.yaml --capabilities CAPABILITY_IAM --stack-name SAM-AWSDBWorkshop2019 --parameter-overrides LambdaLayerNameParameter=aws-db-workshop-pg8000-layer DDBStreamName=$AWSDBWORKSHOP_DDB_STREAM_NAME SecurityGroupIds=$LAMBDASECURITYGROUP_ID VpcSubnetIds=$LAMBDASUBNET1_ID,$LAMBDASUBNET2_ID DatabaseName=$AURORADB_NAME DatabaseHostName=$AURORACLUSTERENDPOINT_NAME DatabaseUserName=$AURORADBMASTERUSER_NAME DatabasePassword=auradmin123
Note: This will take a few minutes. Ensure that the SAM template was successfully deployed. Look for the following line in the terminal as output
Successfully created/updated stack - SAM-AWSDBWorkshop2019 in None
- Finally deactivate the virtual environment.
deactivate
Now you have successfully deployed the Lambda function.
In this section, you will run Python scripts to simulate booking of a taxi trip by a rider followed by acceptance and completion of the trip by a driver. After the trip is complete, you will run back-end SQL queries to process billing and driver payments.
- Copy and paste the following commands to book a new trip as a rider.
cd ~/environment/amazon-rds-purpose-built-workshop/src/ddb-python-script/
python3 rider-book-trip.py
From the output of the script make a note of the tripinfo value. You will be entering this value when prompted by the subsequent scripts. It's a randomly generated string that uniquely identifies a trip. It will be similar to
"tripinfo": "2020-04-05T18:52:42Z,8987397"
- Copy and paste the following command as a driver to accept the trip. The script will prompt for the 'tripinfo' value. Enter the value (without double quotes for e.g. 2020-04-05T18:52:42Z,8987397) from the output of the previous Python script you just ran as a rider to book a new trip.
python3 driver-accept-trip.py
- Copy and paste the following command as a driver to complete the trip. The script will prompt for the 'tripinfo' value. Enter the same 'tripinfo' value (without double quotes for e.g. 2020-04-05T18:52:42Z,8987397) that you provided as input to the previous Python script you just ran as a driver to accept the trip.
python3 driver-complete-trip.py
Note: The trip status is COMPLETE now and so the Lambda function would have picked the record from DynamoDB and inserted into Aurora PostgreSQL.
- Copy and paste the commands below to connect to the Aurora PostgreSQL database. Enter the password string for the Aurora database when prompted.
AURORADB_NAME=$(aws cloudformation describe-stacks --stack-name $AWSDBWORKSHOP_CFSTACK_NAME | jq -r '.Stacks[].Outputs[] | select(.OutputKey=="AuroraDBName") | .OutputValue')
echo $AURORADB_NAME
AURORACLUSTERENDPOINT_NAME=$(aws cloudformation describe-stacks --stack-name $AWSDBWORKSHOP_CFSTACK_NAME | jq -r '.Stacks[].Outputs[] | select(.OutputKey=="AuroraClusterEndpointName") | .OutputValue')
echo $AURORACLUSTERENDPOINT_NAME
AURORADBMASTERUSER_NAME=$(aws cloudformation describe-stacks --stack-name $AWSDBWORKSHOP_CFSTACK_NAME | jq -r '.Stacks[].Outputs[] | select(.OutputKey=="AuroraDBMasterUser") | .OutputValue')
echo $AURORADBMASTERUSER_NAME
sudo psql -h $AURORACLUSTERENDPOINT_NAME -U $AURORADBMASTERUSER_NAME -d $AURORADB_NAME
- Execute the query below to review the trip information in the trips table that you just completed in the previous section.
select * from trips;
The output of the query should have at-least 1 row in the trips table, similar to the output below.
id | rider_id | driver_id | rider_name | rider_mobile | rider_email | trip_info | driver_name | driver_email | driver_mobile | vehicle_id | cab_type_id | vendor_id | pickup_datetime | dropoff_datetime | store_and_fwd_flag | rate_code_id | pickup_longitude | pickup_latitude | dropoff_longitude | dropoff_latitude | pa ssenger_count | trip_distance | fare_amount | extra | mta_tax | tip_amount | tolls_amount | ehail_fee | improvement_surcharge | total_amount | payment_type | trip_type | pickup_location_id | dropoff_location_id | status ---------+----------+-----------+-------------+--------------+-------------------------+-------------------------------------+--------------+-----------------------+---------------+------------+-------------+-----------+---------------------+---------------------+--------------------+--------------+------------------+-----------------+-------------------+------------------+--- --------------+---------------+-------------+-------+---------+------------+--------------+-----------+-----------------------+--------------+--------------+-----------+--------------------+---------------------+----------- 2000001 | 69257 | 528204 | person69257 | +11609467790 | [email protected] | 2019-12-18T05:15:33.640038Z,3219350 | driver528204 | [email protected] | +11185992795 | PXX248130 | 2 | 2 | 2019-12-18 05:15:33 | 2019-12-18 05:19:10 | N | 4 | -73.496113 | 40.664146 | -73.527485 | 40.665024 | 3 | 32 | 142.96 | 0.3 | 0.4 | 4.92 | 4.4 | 0 | 0.3 | 14.18 | 3 | 2 | 0 | 0 | Completed 2000002 | 69257 | 507977 | person69257 | +11609467790 | [email protected] | 2019-12-18T05:31:13.478619Z,1747531 | driver507977 | [email protected] | +11088418780 | XVJ356159 | 2 | 2 | 2019-12-18 05:31:13 | 2019-12-18 05:31:57 | N | 3 | -73.401165 | 40.866392 | -73.065379 | 40.96106 | 4 | 8 | 55.39 | 1.0 | 0.4 | 8.57 | 2.25 | 0 | 0.8 | 127.75 | 3 | 2 | 0 | 0 | Completed (2 rows)
- Execute the query below to insert driver billing information for all the drivers, for the current daily billing cycle based on the trip information in the trips table.
insert into billing (driver_id, billing_cycle, billing_start, billing_end, billing_amount, commissions, rides_total, description, billing_status)
select driver_id, 2, current_date, current_date+1, sum(total_amount), 0.8, count(*), 'billing cycle 2', 'completed' from trips
where dropoff_datetime < current_date+1 and dropoff_datetime > current_date
group by driver_id;
The query should insert at-least 1 row into the billing table and the output should be similar to the following.
INSERT 0 1
- Execute the query below to review the billing information that you just inserted.
select * from billing where billing_cycle=2;
The output of the query should retrieve at-least 1 row from the billing table, similar to the output below.
id | driver_id | billing_cycle | billing_start | billing_end | billing_date | billing_amount | commissions | description | rides_total | billing_status --------+-----------+---------------+---------------------+---------------------+----------------------------+----------------+-------------+-----------------+-------------+---------------- 200001 | 510909 | 2 | 2019-09-15 00:00:00 | 2019-09-16 00:00:00 | 2019-09-16 01:59:05.634323 | 42.26 | 0.8 | billing cycle 2 | 1 | completed (1 row)
- Execute the query below to insert driver payment information for all the drivers for the current billing cycle based on the billing information in the billing table.
insert into payment(billing_id,driver_id,billing_cycle,payment_amount,payment_date, payment_id, payment_status,description) select a.id, a.driver_id, a.billing_cycle,sum(a.billing_amount*a.commissions),a.billing_date, b.payment_id, 'completed','Payment cycle Jan 2020'
from billing a, drivers b where a.driver_id=b.driver_id and a.billing_cycle = 2 and a.billing_status = 'completed' group by a.id, a.driver_id,b.payment_id, a.billing_cycle, a.billing_date;
The query should insert at-least 1 row into the payment table and the output should be similar to the output below.
INSERT 0 1
- Execute the query below to review the payment information that you just inserted.
select * from payment where description='Payment cycle Jan 2020';
The output of the query should retrieve at-least 1 row from the payment table, similar to the output below.
id | billing_id | driver_id | billing_cycle | payment_amount | payment_date | payment_id | payment_status | description --------+------------+-----------+---------------+----------------+----------------------------+------------+----------------+------------------------ 200001 | 200001 | 510909 | 2 | 33.808 | 2019-09-16 01:59:05.634323 | 7 | completed | Payment cycle Jan 2020 (1 row)
- Execute the following command to close the database connection.
\q
Congrats!! You have successfully completed Lab 2. Now you can proceed to Lab 3.