The quickest way to get your first workflow deployed on Aqueduct
First things first, we'll install the Aqueduct pip package and start Aqueduct in your terminal:
!pip3 install aqueduct-ml
!aqueduct start
Next, we import everything we need and create our Aqueduct client:
from aqueduct import Client, op, metric, check
import pandas as pd
client = Client()
Note that the API key associated with the server can also be found in the output of the aqueduct start command.
The base data for our workflow is the hotel reviews dataset in the pre-built Demo that comes with the Aqueduct server. This code does two things -- (1) it loads a connection to the demo database, and (2) it runs a SQL query against that DB and returns a pointer to the resulting dataset.
demo_db = client.resource("Demo")
reviews_table = demo_db.sql("select * from hotel_reviews;")
# You will see the type of `reviews_table` is an Aqueduct TableArtifact.
print(type(reviews_table))
# Calling .get() allows us to retrieve the underlying data from the TableArtifact and
# returns it to you as a Python object.
reviews_table.get()
Output
hotel_name | review_date | reviewer_nationality | review | |
---|---|---|---|---|
0 | H10 Itaca | 2017-08-03 | Australia | Damaged bathroom shower screen sealant and ti... |
1 | De Vere Devonport House | 2016-03-28 | United Kingdom | No Negative The location and the hotel was ver... |
2 | Ramada Plaza Milano | 2016-05-15 | Kosovo | No Negative Im a frequent traveler i visited m... |
3 | Aloft London Excel | 2016-11-05 | Canada | Only tepid water for morning shower They said ... |
4 | The Student Hotel Amsterdam City | 2016-07-31 | Australia | No Negative The hotel had free gym table tenni... |
... | ... | ... | ... | ... |
95 | The Chesterfield Mayfair | 2015-08-25 | Denmark | Bad Reading light And light in bathNo Positive |
96 | Hotel V Nesplein | 2015-08-27 | Turkey | Nothing except the construction going on the s... |
97 | Le Parisis Paris Tour Eiffel | 2015-10-20 | Australia | When we arrived we had to bring our own baggag... |
98 | NH Amsterdam Museum Quarter | 2016-01-26 | Belgium | No stairs even to go the first floor Restaura... |
99 | Barcel Raval | 2017-07-07 | United Kingdom | Air conditioning a little zealous Nice atmosp... |
100 rows × 4 columns
reviews_table
is an Artifact -- simply a wrapper around some data -- in Aqueduct terminology and will now serve as the base data for our workflow. We can apply Python functions to it in order to transform it.
A piece of Python code that transforms an Artifact is called an Operator, which is simply just a decorated Python function. Here, we'll write a simple operator that takes in our reviews table and calculates the length of the review string. It's not too exciting, but it should give you a sense of how Aqueduct works.
@op
def transform_data(reviews):
'''
This simple Python function takes in a DataFrame with hotel reviews
and adds a column called strlen that has the string length of the
review.
'''
reviews['strlen'] = reviews['review'].str.len()
return reviews
strlen_table = transform_data(reviews_table)
Notice that we added @op above our function definition: This tells Aqueduct that we want to run this function as a part of an Aqueduct workflow. A function decorated with @op can be called like a regular Python function, and Aqueduct takes note of this call to begin constructing a workflow.
Now that we have our string length operator, we can get a preview of our data by calling .get()
strlen_table.get()
Output
hotel_name | review_date | reviewer_nationality | review | strlen | |
---|---|---|---|---|---|
0 | H10 Itaca | 2017-08-03 | Australia | Damaged bathroom shower screen sealant and ti... | 82 |
1 | De Vere Devonport House | 2016-03-28 | United Kingdom | No Negative The location and the hotel was ver... | 84 |
2 | Ramada Plaza Milano | 2016-05-15 | Kosovo | No Negative Im a frequent traveler i visited m... | 292 |
3 | Aloft London Excel | 2016-11-05 | Canada | Only tepid water for morning shower They said ... | 368 |
4 | The Student Hotel Amsterdam City | 2016-07-31 | Australia | No Negative The hotel had free gym table tenni... | 167 |
... | ... | ... | ... | ... | ... |
95 | The Chesterfield Mayfair | 2015-08-25 | Denmark | Bad Reading light And light in bathNo Positive | 47 |
96 | Hotel V Nesplein | 2015-08-27 | Turkey | Nothing except the construction going on the s... | 456 |
97 | Le Parisis Paris Tour Eiffel | 2015-10-20 | Australia | When we arrived we had to bring our own baggag... | 672 |
98 | NH Amsterdam Museum Quarter | 2016-01-26 | Belgium | No stairs even to go the first floor Restaura... | 156 |
99 | Barcel Raval | 2017-07-07 | United Kingdom | Air conditioning a little zealous Nice atmosp... | 72 |
100 rows × 5 columns
We're going to apply a Metric to our strlen_table, which will calculate a numerical summary of our predictions (in this case, just the mean string length).
@metric
def average_strlen(strlen_table):
return (strlen_table["strlen"]).mean()
avg_strlen = average_strlen(strlen_table)
avg_strlen.get()
Output:
223.18
Note that metrics are denoted with the @metric decorator. Metrics can be computed over any operator, and even other metrics.
Let's say that we want to make sure the average strlen of hotel reviews never exceeds 250 characters. We can add a check over the avg_strlen
metric.
@check(severity="error")
def limit_avg_strlen(avg_strlen):
return avg_strlen < 250
limit_avg_strlen(avg_strlen)
Output:
<aqueduct.artifacts.bool_artifact.BoolArtifact at 0x7f7e65b46ee0>
Note that checks are denoted with the @check decorator. Checks can also computed over any operator or metric. Setting the severity to "error" will automatically fail the workflow if this check is ever violated. Check severity can also be set to "warning" (default), which only print a warning message on any violation.
Finally, we can save the transformed table strlen_table
back to the Aqueduct demo database. See here for more details around using resource objects.
demo_db.save(strlen_table, table_name="strlen_table", update_mode="replace")
Note that this save is not performed until the flow is actually published.
This creates the flow in Aqueduct. You will receive a URL below that will take you to the Aqueduct UI which will show you the status of your workflow runs, and allow you to inspect the data.
client.publish_flow(name="review_strlen", artifacts=[strlen_table])
Output:
<aqueduct.flow.Flow at 0x7f7e61d9cdc0>
And we're done! We've created our first workflow together, and you're off to the races.
There is a lot more you can do with Aqueduct, including having flows run automatically on a cadence, parameterizing flows, and reading to and writing from many different data resources (S3, Postgres, etc.). Check out the other tutorials and examples here for a deeper dive!