Skip to content

Commit

Permalink
Merge pull request #177 from awslabs/sql-join-tables
Browse files Browse the repository at this point in the history
Sql Example - Join tables in Amazon Timestream SQL query
  • Loading branch information
nfunke authored Feb 8, 2024
2 parents 8599b3c + e7f767f commit 6697e52
Show file tree
Hide file tree
Showing 8 changed files with 30,764 additions and 5 deletions.
9 changes: 5 additions & 4 deletions sample_apps/sql/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,9 @@ Each example contains

## Folder content

Folder | Description
-------|-------------
[utils](utils) | Utility python script that performs <br /> 1. Upload example data to S3 <br /> 2. Creates Batch Load task to load example data into Timestream Table
[last_value_fill_forward](last_value_fill_forward) | Create time series and fill gaps with LAST_VALUE() function |
Folder | Description
-------|------------------------------------------------------------------------------------------------------------------------------------------------------
[utils](utils) | Utility python script that performs <br /> 1. Upload example data to S3 <br /> 2. Creates Batch Load task to load example data into Timestream Table
[join_tables](join_tables) | Demostrate how to JOIN tables in SQL |
[last_value_fill_forward](last_value_fill_forward) | Create time series and fill gaps with LAST_VALUE() function |

90 changes: 90 additions & 0 deletions sample_apps/sql/join_tables/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# Join tables in Amazon Timestream SQL query

The goal for this join is that there are 2 tables
1. Sensor events for sensor measurements
2. Sensor status events and other details

The JOIN examples enriches the sensor events with data from the status event. Common use is to only filter sensor events for sensors that are active. The active/stopped informaiton is here stored in the second table of sensor details.

### 1. Sensor Event table
Contains measure events (temperature, humidity)

| Column | Type | Description |
|--------|------|-------------|
|gpio|Dimension| Sensor channel identifier |
|time|Timestamp| Time measure event was taken|
|measure_name|measure_name|Measure type|
|temperature|DOUBLE|Temperature in F|
|humidity|DOUBLE|Humidity in % |

### 2. Sensor Details table
Additional details about the sensor. These are time based change events

| Column | Type | Description |
|--------|------|-----------------------------------------|
|gpio|Dimension| Sensor channel identifier |
|time|Timestamp| Time details were created/changed |
|measure_name|measure_name| event type (initial_event/change_event) |
|name|VARCHAR|readable name of sensor|
|status|VARCHAR|active/stopped status of sensor|

## SQL query

Two SQL example statements are included in the file ```query_example_join_tables.sql```
1. Standard SQL JOIN to illustrate SQL compatibility
2. Performance optimized CTE JOIN

## Loading example data

The following commands is using the ```create_batch_load_task.py``` utility to load the example csv files used in this
query example. Please adjust as needed

### 1. Load sensor events data

```shell
cd <github-clone-root>/amazon-timestream-tools/sample_apps/sql/utils
```

```shell
python3 ./create_batch_load_task.py \
region=<your_region> \
mapping=../join_tables/datamodel_events.json \
input_bucket=<s3-bucket-name> \
object_key_prefix=<upload_folder> \
data_file=../join_tables/sensor_events.csv \
database=amazon-timestream-tools \
table=sensor_events \
partition_key=gpio
```

### 2. Load sensor details data

```shell
python3 ./create_batch_load_task.py \
region=<your_region> \
mapping=../join_tables/datamodel_details.json \
input_bucket=<s3-bucket-name> \
object_key_prefix=<upload_folder> \
data_file=../join_tables/sensor_details.csv \
database=amazon-timestream-tools \
table=sensor_details \
partition_key=gpio

```

| **⚠ Note**: |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| It is recommended to use the database name `amazon-timestream-tools` and table names `sensor_events` and `sensor_details` as described below. The SQL statement uses `"amazon-timestream-tools"."sensorddata"` in WHERE clause and would not need to be modified. If you use a different database name and table combination, please adjust the SQL query as needed. |

Parameters used:

Parameter | Description | Recommended Value
-----------------|----------------------------------------------------------------------------------------------------------------------------------------|-------------------
**region** | Region where database is deployed | any region where Timestream is available
**mapping** | File used to map CSV data columns to Timestream attributes | `../join_tables/datamodel_events.json` <br/> `../join_tables/datamodel_details.json`
**input_bucket** | S3 bucket used to upload data file and report Batch Load Status | existing S3 bucket
**object_key** | Folder where data file will be uploaded | any string, should not be root folder
**data_file** | CSV file for this example | `../join_tables/sensor_events.csv ` <br/> `../join_tables/sensor_details.csv`
**database** | Database in region. Database will be created if not exists. | `amazon-timestream-tools`
**table** | Table where data is loaded. If this table does not exist, the table will be created | `sensor_events` <br/>`sensor_details`
**partition_key**| Custome defined partition key (CDPK) used for this data example | `gpio`
25 changes: 25 additions & 0 deletions sample_apps/sql/join_tables/datamodel_details.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
{
"TimeUnit": "MILLISECONDS",
"TimeColumn": "time",
"DimensionMappings": [
{
"SourceColumn": "gpio",
"DestinationColumn": "gpio"
}
],
"MultiMeasureMappings": {
"MultiMeasureAttributeMappings": [
{
"SourceColumn": "name",
"TargetMultiMeasureAttributeName": "name",
"MeasureValueType": "VARCHAR"
},
{
"SourceColumn": "status",
"TargetMultiMeasureAttributeName": "status",
"MeasureValueType": "VARCHAR"
}
]
},
"MeasureNameColumn": "measure_name"
}
25 changes: 25 additions & 0 deletions sample_apps/sql/join_tables/datamodel_events.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
{
"TimeUnit": "MILLISECONDS",
"TimeColumn": "time",
"DimensionMappings": [
{
"SourceColumn": "gpio",
"DestinationColumn": "gpio"
}
],
"MultiMeasureMappings": {
"MultiMeasureAttributeMappings": [
{
"SourceColumn": "temperature",
"TargetMultiMeasureAttributeName": "temperature",
"MeasureValueType": "DOUBLE"
},
{
"SourceColumn": "humidity",
"TargetMultiMeasureAttributeName": "humidity",
"MeasureValueType": "DOUBLE"
}
]
},
"MeasureNameColumn": "measure_name"
}
25 changes: 25 additions & 0 deletions sample_apps/sql/join_tables/query_example_join_tables.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
-- Example 1 - Standard JOIN
SELECT events.time, events.gpio, temperature, humidity, name, status
FROM "amazon-timestream-tools"."sensor_events" as events
JOIN "amazon-timestream-tools"."sensor_details" as details
ON events.gpio = details.gpio
WHERE details.status = 'active'
AND events.time BETWEEN '2023-09-19' AND '2023-11-01'
AND details.time BETWEEN '2023-09-19' AND '2023-11-01'

-- Example 2 - CTO JOIN (preferred)
with events as (
SELECT time, gpio, temperature, humidity
FROM "amazon-timestream-tools"."sensor_events"
WHERE time BETWEEN '2023-09-19' AND '2023-11-01'
),
details as (
SELECT time, gpio, name, status
FROM "amazon-timestream-tools"."sensor_details"
WHERE time BETWEEN '2023-09-19' AND '2023-11-01'
AND status = 'active'
)
SELECT events.time, events.gpio, temperature, humidity, name, status
FROM events
JOIN details
ON events.gpio = details.gpio
4 changes: 4 additions & 0 deletions sample_apps/sql/join_tables/sensor_details.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
gpio,time,name,status,measure_name
17,1695335950657,office,active,initial_event
22,1695335950657,outside,active,change_event
27,1695335950657,window,stopped,change_event
Loading

0 comments on commit 6697e52

Please sign in to comment.