From 10fffa5c99a3c6232778a30d562c73759ec323bc Mon Sep 17 00:00:00 2001 From: Takumi Ohyama Date: Fri, 15 Apr 2022 07:34:47 +0000 Subject: [PATCH 1/8] renamed old labs --- ...ynb => 5a_train_keras_vertex_babyweight_feature_columns.ipynb} | 0 ...ynb => 5a_train_keras_vertex_babyweight_feature_columns.ipynb} | 0 2 files changed, 0 insertions(+), 0 deletions(-) rename notebooks/end-to-end-structured/labs/{5a_train_keras_ai_platform_babyweight_vertex.ipynb => 5a_train_keras_vertex_babyweight_feature_columns.ipynb} (100%) rename notebooks/end-to-end-structured/solutions/{5a_train_keras_ai_platform_babyweight_vertex.ipynb => 5a_train_keras_vertex_babyweight_feature_columns.ipynb} (100%) diff --git a/notebooks/end-to-end-structured/labs/5a_train_keras_ai_platform_babyweight_vertex.ipynb b/notebooks/end-to-end-structured/labs/5a_train_keras_vertex_babyweight_feature_columns.ipynb similarity index 100% rename from notebooks/end-to-end-structured/labs/5a_train_keras_ai_platform_babyweight_vertex.ipynb rename to notebooks/end-to-end-structured/labs/5a_train_keras_vertex_babyweight_feature_columns.ipynb diff --git a/notebooks/end-to-end-structured/solutions/5a_train_keras_ai_platform_babyweight_vertex.ipynb b/notebooks/end-to-end-structured/solutions/5a_train_keras_vertex_babyweight_feature_columns.ipynb similarity index 100% rename from notebooks/end-to-end-structured/solutions/5a_train_keras_ai_platform_babyweight_vertex.ipynb rename to notebooks/end-to-end-structured/solutions/5a_train_keras_vertex_babyweight_feature_columns.ipynb From 48dd85f76543a552bbfe3f3fe4f786461d04e014 Mon Sep 17 00:00:00 2001 From: Takumi Ohyama Date: Fri, 15 Apr 2022 07:37:07 +0000 Subject: [PATCH 2/8] added babyweights vertex training lab preprocessing layer version --- ...rtex_babyweight_preprocessing_layers.ipynb | 743 ++++++++++++++ ...rtex_babyweight_preprocessing_layers.ipynb | 926 ++++++++++++++++++ 2 files changed, 1669 insertions(+) create mode 100644 notebooks/end-to-end-structured/labs/5a_train_keras_vertex_babyweight_preprocessing_layers.ipynb create mode 100644 notebooks/end-to-end-structured/solutions/5a_train_keras_vertex_babyweight_preprocessing_layers.ipynb diff --git a/notebooks/end-to-end-structured/labs/5a_train_keras_vertex_babyweight_preprocessing_layers.ipynb b/notebooks/end-to-end-structured/labs/5a_train_keras_vertex_babyweight_preprocessing_layers.ipynb new file mode 100644 index 00000000..c878d0f3 --- /dev/null +++ b/notebooks/end-to-end-structured/labs/5a_train_keras_vertex_babyweight_preprocessing_layers.ipynb @@ -0,0 +1,743 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# LAB 5a: Training Keras model on Vertex AI\n", + "\n", + "**Learning Objectives**\n", + "\n", + "1. Setup up the environment\n", + "1. Create trainer module's task.py to hold hyperparameter argparsing code\n", + "1. Create trainer module's model.py to hold Keras model code\n", + "1. Run trainer module package locally\n", + "1. Submit training job to Vertex AI\n", + "1. Submit hyperparameter tuning job to Vertex AI\n", + "\n", + "\n", + "## Introduction\n", + "After having testing our training pipeline both locally and in the cloud on a susbset of the data, we can submit another (much larger) training job to the cloud. It is also a good idea to run a hyperparameter tuning job to make sure we have optimized the hyperparameters of our model. \n", + "\n", + "In this notebook, we'll be training our Keras model at scale using Vertex AI.\n", + "\n", + "In this lab, we will set up the environment, create the trainer module's task.py to hold hyperparameter argparsing code, create the trainer module's model.py to hold Keras model code, run the trainer module package locally, submit a training job to Vertex AI, and submit a hyperparameter tuning job to Vertex AI." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "hJ7ByvoXzpVI" + }, + "source": [ + "## Set up environment variables and load necessary libraries" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "First we will install the `cloudml-hypertune` package on our local machine. This is the package which we will use to report hyperparameter tuning metrics to Vertex AI. Installing the package will allow us to test our trainer package locally." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "try:\n", + " import hypertune\n", + "\n", + "except ImportError:\n", + " !pip3 install -U cloudml-hypertune --user\n", + "\n", + " print(\"Please restart the kernel and re-run the notebook.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If the above command resulted in an installation, please restart the notebook kernel and re-run the notebook." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Import necessary libraries." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Set environment variables.\n", + "\n", + "Set environment variables so that we can use them throughout the entire lab. We will be using our project name for our bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "PROJECT = !gcloud config list --format 'value(core.project)'\n", + "PROJECT = PROJECT[0]\n", + "BUCKET = PROJECT\n", + "REGION = \"us-central1\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "os.environ[\"PROJECT\"] = PROJECT\n", + "os.environ[\"BUCKET\"] = BUCKET\n", + "os.environ[\"REGION\"] = REGION" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Create the bucket if does not exist, and confirm below that the bucket is regional and its region equals to the specified region:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + " %%bash\n", + "if ! gsutil ls | grep -q gs://${BUCKET}/; then\n", + " gsutil mb -l ${REGION} gs://${BUCKET}\n", + "fi\n", + "gsutil ls -Lb gs://$BUCKET | grep \"gs://\\|Location\"\n", + "echo $REGION" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%bash\n", + "gcloud config set project ${PROJECT}\n", + "gcloud config set ai/region ${REGION}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Check data exists\n", + "\n", + "Verify that you previously created CSV files we'll be using for training and evaluation. If not, go back to lab [1b_prepare_data_babyweight](../solutions/1b_prepare_data_babyweight.ipynb) to create them." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%bash\n", + "gsutil ls gs://${BUCKET}/babyweight/data/*000000000000.csv" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now that we have the [Keras wide-and-deep code](../solutions/4c_keras_wide_and_deep_babyweight.ipynb) working on a subset of the data, we can package the TensorFlow code up as a Python module and train it on Vertex AI.\n", + "\n", + "## Train on Vertex AI\n", + "\n", + "Training on Vertex AI requires:\n", + "* Making the code a Python source distribution\n", + "* Using gcloud to submit the training code to [Vertex AI](https://console.cloud.google.com/vertex-ai)\n", + "\n", + "Ensure that the Vertex AI API is enabled by going to this [link](https://console.developers.google.com/apis/library/aiplatform.googleapis.com).\n", + "\n", + "### Move code into a Python package\n", + "\n", + "A Python package is simply a collection of one or more `.py` files along with an `__init__.py` file to identify the containing directory as a package. The `__init__.py` sometimes contains initialization code but for our purposes an empty file suffices.\n", + "\n", + "The bash command `touch` creates an empty file in the specified location, the directory `babyweight` should already exist." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%bash\n", + "mkdir -p babyweight/trainer\n", + "touch babyweight/trainer/__init__.py" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We then use the `%%writefile` magic to write the contents of the cell below to a file called `task.py` in the `babyweight/trainer` folder." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create trainer module's task.py to hold hyperparameter argparsing code.\n", + "\n", + "The cell below writes the file `babyweight/trainer/task.py` which sets up our training job. Here is where we determine which parameters of our model to pass as flags during training using the `parser` module. Look at how `batch_size` is passed to the model in the code below. Use this as an example to parse arguements for the following variables\n", + "- `nnsize` which represents the hidden layer sizes to use for DNN feature columns\n", + "- `nembeds` which represents the embedding size of a cross of n key real-valued parameters\n", + "- `train_examples` which represents the number of examples (in thousands) to run the training job\n", + "- `eval_steps` which represents the positive number of steps for which to evaluate model\n", + "\n", + "Be sure to include a default value for the parsed arguments above and specfy the `type` if necessary." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile babyweight/trainer/task.py\n", + "import argparse\n", + "import json\n", + "import os\n", + "\n", + "from trainer import model\n", + "\n", + "import tensorflow as tf\n", + "\n", + "if __name__ == \"__main__\":\n", + " parser = argparse.ArgumentParser()\n", + " parser.add_argument(\n", + " \"--train_data_path\",\n", + " help=\"GCS location of training data\",\n", + " required=True\n", + " )\n", + " parser.add_argument(\n", + " \"--eval_data_path\",\n", + " help=\"GCS location of evaluation data\",\n", + " required=True\n", + " )\n", + " parser.add_argument(\n", + " \"--output_dir\",\n", + " help=\"GCS location to write checkpoints and export models\",\n", + " default = os.getenv(\"AIP_MODEL_DIR\")\n", + " )\n", + " parser.add_argument(\n", + " \"--batch_size\",\n", + " help=\"Number of examples to compute gradient over.\",\n", + " type=int,\n", + " default=512\n", + " )\n", + " # TODO: Add nnsize argument\n", + "\n", + " # TODO: Add nembeds argument\n", + "\n", + " # TODO: Add num_epochs argument\n", + "\n", + " # TODO: Add train_examples argument\n", + "\n", + " # TODO: Add eval_steps argument\n", + "\n", + "\n", + " # Parse all arguments\n", + " args = parser.parse_args()\n", + " arguments = args.__dict__\n", + "\n", + " # Modify some arguments\n", + " arguments[\"train_examples\"] *= 1000\n", + "\n", + " # Run the training job\n", + " model.train_and_evaluate(arguments)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In the same way we can write to the file `model.py` the model that we developed in the previous notebooks. \n", + "\n", + "### Create trainer module's model.py to hold Keras model code.\n", + "\n", + "To create our `model.py`, we'll use the code we wrote for the Wide & Deep model. Look back at your [9_keras_wide_and_deep_babyweight](../solutions/9_keras_wide_and_deep_babyweight.ipynb) notebook and copy/paste the necessary code from that notebook into its place in the cell below." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile babyweight/trainer/model.py\n", + "import datetime\n", + "import os\n", + "import shutil\n", + "import numpy as np\n", + "import tensorflow as tf\n", + "import hypertune\n", + "\n", + "# Determine CSV, label, and key columns\n", + "# TODO: Add CSV_COLUMNS and LABEL_COLUMN\n", + "\n", + "# TODO: Add NUMERIC_COLUMNS and CATEGORICAL_COLUMNS\n", + "\n", + "# Set default values for each CSV column.\n", + "# Treat is_male and plurality as strings.\n", + "# TODO: Add DEFAULTS\n", + "\n", + "\n", + "\n", + "def features_and_labels(row_data):\n", + " # TODO: Add your code here\n", + " pass\n", + "\n", + "\n", + "def load_dataset(pattern, batch_size=1, mode=tf.estimator.ModeKeys.EVAL):\n", + " # TODO: Add your code here\n", + " pass\n", + "\n", + "\n", + "def create_input_layers():\n", + " # TODO: Add your code here\n", + " pass\n", + "\n", + "\n", + "def transform(inputs, nembeds):\n", + " # TODO: Add your code here\n", + " pass\n", + "\n", + "def get_model_outputs(wide_inputs, deep_inputs, dnn_hidden_units):\n", + " # TODO: Add your code here\n", + " pass\n", + "\n", + "\n", + "\n", + "def rmse(y_true, y_pred):\n", + " \"\"\"Calculates RMSE evaluation metric.\n", + "\n", + " Args:\n", + " y_true: tensor, true labels.\n", + " y_pred: tensor, predicted labels.\n", + " Returns:\n", + " Tensor with value of RMSE between true and predicted labels.\n", + " \"\"\"\n", + " return tf.sqrt(tf.reduce_mean(tf.square(y_pred - y_true)))\n", + "\n", + "\n", + "def build_wide_deep_model(dnn_hidden_units=[64, 32], nembeds=3):\n", + " # TODO: Add your code here\n", + " pass\n", + "\n", + "\n", + "\n", + "# Instantiate the HyperTune reporting object\n", + "hpt = hypertune.HyperTune()\n", + "\n", + "# Reporting callback\n", + "class HPTCallback(tf.keras.callbacks.Callback):\n", + "\n", + " def on_epoch_end(self, epoch, logs=None):\n", + " global hpt\n", + " hpt.report_hyperparameter_tuning_metric(\n", + " hyperparameter_metric_tag='val_rmse',\n", + " metric_value=logs['val_rmse'],\n", + " global_step=epoch)\n", + "\n", + " \n", + "def train_and_evaluate(args):\n", + " model = build_wide_deep_model(args[\"nnsize\"], args[\"nembeds\"])\n", + " print(\"Here is our Wide-and-Deep architecture so far:\\n\")\n", + " print(model.summary())\n", + "\n", + " trainds = load_dataset(\n", + " args[\"train_data_path\"],\n", + " args[\"batch_size\"],\n", + " tf.estimator.ModeKeys.TRAIN)\n", + "\n", + " evalds = load_dataset(\n", + " args[\"eval_data_path\"], 1000, tf.estimator.ModeKeys.EVAL)\n", + " if args[\"eval_steps\"]:\n", + " evalds = evalds.take(count=args[\"eval_steps\"])\n", + "\n", + " num_batches = args[\"batch_size\"] * args[\"num_epochs\"]\n", + " steps_per_epoch = args[\"train_examples\"] // num_batches\n", + "\n", + " checkpoint_path = os.path.join(args[\"output_dir\"], \"checkpoints/babyweight\")\n", + " cp_callback = tf.keras.callbacks.ModelCheckpoint(\n", + " filepath=checkpoint_path, verbose=1, save_weights_only=True)\n", + "\n", + " history = model.fit(\n", + " trainds,\n", + " validation_data=evalds,\n", + " epochs=args[\"num_epochs\"],\n", + " steps_per_epoch=steps_per_epoch,\n", + " verbose=2, # 0=silent, 1=progress bar, 2=one line per epoch\n", + " callbacks=[cp_callback, HPTCallback()])\n", + "\n", + " EXPORT_PATH = os.path.join(\n", + " args[\"output_dir\"], datetime.datetime.now().strftime(\"%Y%m%d%H%M%S\"))\n", + " tf.saved_model.save(\n", + " obj=model, export_dir=EXPORT_PATH) # with default serving function\n", + " \n", + " print(\"Exported trained model to {}\".format(EXPORT_PATH))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Train locally\n", + "\n", + "After moving the code to a package, make sure it works as a standalone. Note, we incorporated the `--train_examples` flag so that we don't try to train on the entire dataset while we are developing our pipeline. Once we are sure that everything is working on a subset, we can change it so that we can train on all the data. Even for this subset, this takes about *3 minutes* in which you won't see any output ..." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Run trainer module package locally.\n", + "\n", + "We can run a very small training job over a single file with a small batch size, 1 epoch, 1 train example, and 1 eval step." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%bash\n", + "OUTDIR=babyweight_trained\n", + "rm -rf ${OUTDIR}\n", + "export PYTHONPATH=${PYTHONPATH}:${PWD}/babyweight\n", + "python3 -m trainer.task \\\n", + " --train_data_path=gs://${BUCKET}/babyweight/data/train*.csv \\\n", + " --eval_data_path=gs://${BUCKET}/babyweight/data/eval*.csv \\\n", + " --output_dir=${OUTDIR} \\\n", + " --batch_size=# TODO: Add batch size\n", + " --num_epochs=# TODO: Add the number of epochs to train for\n", + " --train_examples=# TODO: Add the number of examples to train each epoch for\n", + " --eval_steps=# TODO: Add the number of evaluation batches to run" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Training on Vertex AI\n", + "\n", + "Now that we see everything is working locally, it's time to train on the cloud! First, we need to package our code as a source distribution. For this, we can use `setuptools`. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile babyweight/setup.py\n", + "from setuptools import find_packages\n", + "from setuptools import setup\n", + "\n", + "setup(\n", + " name='babyweight_trainer',\n", + " version='0.1',\n", + " packages=find_packages(),\n", + " include_package_data=True,\n", + " description='Babyweight model training application.'\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%bash\n", + "cd babyweight\n", + "python ./setup.py sdist --formats=gztar\n", + "cd .." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We will store our package in the Cloud Storage bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%bash\n", + "gsutil cp babyweight/dist/babyweight_trainer-0.1.tar.gz gs://${BUCKET}/babyweight/" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To submit to the Cloud we use [`gcloud custom-jobs create`](https://cloud.google.com/sdk/gcloud/reference/ai/custom-jobs/create) and simply specify some additional parameters for the Vertex AI Training Service:\n", + "- display-name: A unique identifier for the Cloud job. We usually append system time to ensure uniqueness\n", + "- region: Cloud region to train in. See [here](https://cloud.google.com/vertex-ai/docs/general/locations) for supported Vertex AI Training Service regions\n", + "\n", + "You might have earlier seen `gcloud ai custom-jobs create` executed with the `worker pool spec` and pass-through Python arguments specified directly in the command call, here we will use a YAML file, this will make it easier to transition to hyperparameter tuning.\n", + "\n", + "Through the `args:` argument we add in the passed-through arguments for our `task.py` file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%bash\n", + "\n", + "TIMESTAMP=$(date -u +%Y%m%d_%H%M%S)\n", + "OUTDIR=gs://${BUCKET}/babyweight/trained_model_$TIMESTAMP\n", + "JOB_NAME=babyweight_$TIMESTAMP\n", + "\n", + "PYTHON_PACKAGE_URI=gs://${BUCKET}/babyweight/babyweight_trainer-0.1.tar.gz\n", + "PYTHON_PACKAGE_EXECUTOR_IMAGE_URI=\"us-docker.pkg.dev/vertex-ai/training/tf-cpu.2-8:latest\"\n", + "PYTHON_MODULE=trainer.task\n", + "\n", + "echo > ./config.yaml \"workerPoolSpecs:\n", + " machineSpec:\n", + " machineType: n1-standard-4\n", + " replicaCount: 1\n", + " pythonPackageSpec:\n", + " executorImageUri: $PYTHON_PACKAGE_EXECUTOR_IMAGE_URI\n", + " packageUris: $PYTHON_PACKAGE_URI\n", + " pythonModule: $PYTHON_MODULE\n", + " args:\n", + " - --train_data_path=# TODO: Add path to training data in GCS\n", + " - --eval_data_path=# TODO: Add path to evaluation data in GCS\n", + " - --output_dir=$OUTDIR\n", + " - --num_epochs=# TODO: Add the number of epochs to train for\n", + " - --train_examples=# TODO: Add the number of examples to train each epoch for\n", + " - --eval_steps=# TODO: Add the number of evaluation batches to run\n", + " - --batch_size=# TODO: Add batch size\n", + " - --nembeds=# TODO: Add number of embedding dimensions\n", + "\n", + "gcloud ai custom-jobs create \\\n", + " --region=${REGION} \\\n", + " --display-name=$JOB_NAME \\\n", + " --config=config.yaml" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The training job should complete within 10 to 15 minutes. You will need a trained model to complete our next lab." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Hyperparameter tuning\n", + "\n", + "To do hyperparameter tuning, create a YAML file and and pass its name with `--config`.\n", + "This step could take hours -- you can increase `--parallel-trial-count` or reduce `--max-trial-count` to get it done faster. Since `--parallel-trial-count` is the number of initial seeds to start searching from, you don't want it to be too large; otherwise, all you have is a random search." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%bash\n", + "TIMESTAMP=$(date -u +%Y%m%d_%H%M%S)\n", + "BASE_OUTPUT_DIR=gs://${BUCKET}/babyweight/hp_tuning_$TIMESTAMP\n", + "JOB_NAME=babyweight_hpt_$TIMESTAMP\n", + "\n", + "PYTHON_PACKAGE_URI=gs://${BUCKET}/babyweight/babyweight_trainer-0.1.tar.gz\n", + "PYTHON_PACKAGE_EXECUTOR_IMAGE_URI=\"us-docker.pkg.dev/vertex-ai/training/tf-cpu.2-8:latest\"\n", + "PYTHON_MODULE=trainer.task\n", + "\n", + "echo > ./hyperparam.yaml \"displayName: $JOB_NAME\n", + "studySpec:\n", + " metrics:\n", + " - metricId: # TODO: Add metric we want to optimize\n", + " goal: # TODO: MAXIMIZE or MINIMIZE?\n", + " parameters:\n", + " - parameterId: batch_size\n", + " # TODO: What datatype (which ValueSpec)?\n", + " minValue: # TODO: Choose a min value\n", + " maxValue: # TODO: Choose a max value\n", + " scaleType: # TODO: UNIT_LINEAR_SCALE or UNIT_LOG_SCALE?\n", + " - parameterId: nembeds\n", + " # TODO: What datatype (which ValueSpec)?\n", + " minValue: # TODO: Choose a min value\n", + " maxValue: # TODO: Choose a max value\n", + " scaleType: # TODO: UNIT_LINEAR_SCALE or UNIT_LOG_SCALE?\n", + " algorithm: ALGORITHM_UNSPECIFIED # results in Bayesian optimization\n", + "trialJobSpec:\n", + " baseOutputDirectory:\n", + " outputUriPrefix: $BASE_OUTPUT_DIR\n", + " workerPoolSpecs:\n", + " - machineSpec:\n", + " machineType: n1-standard-8\n", + " pythonPackageSpec:\n", + " executorImageUri: $PYTHON_PACKAGE_EXECUTOR_IMAGE_URI\n", + " packageUris:\n", + " - $PYTHON_PACKAGE_URI\n", + " pythonModule: $PYTHON_MODULE\n", + " args:\n", + " - --train_data_path=gs://${BUCKET}/babyweight/data/train*.csv\n", + " - --eval_data_path=gs://${BUCKET}/babyweight/data/eval*.csv\n", + " - --num_epochs=10\n", + " - --train_examples=5000\n", + " - --eval_steps=100\n", + " - --batch_size=32\n", + " - --nembeds=8\n", + " replicaCount: 1\"\n", + " \n", + "gcloud beta ai hp-tuning-jobs create \\\n", + " --region=$REGION \\\n", + " --display-name=$JOB_NAME \\\n", + " --# TODO: Add config for hyperparam.yaml\n", + " --max-trial-count=20 \\\n", + " --parallel-trial-count=5" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Repeat training\n", + "\n", + "This time with tuned parameters for `batch_size` and `nembeds`. Note that your best results may differ from below. So be sure to fill yours in!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%bash\n", + "TIMESTAMP=$(date -u +%Y%m%d_%H%M%S)\n", + "OUTDIR=gs://${BUCKET}/babyweight/tuned_$TIMESTAMP\n", + "JOB_NAME=babyweight_tuned_$TIMESTAMP\n", + "\n", + "PYTHON_PACKAGE_URI=gs://${BUCKET}/babyweight/babyweight_trainer-0.1.tar.gz\n", + "PYTHON_PACKAGE_EXECUTOR_IMAGE_URI=\"us-docker.pkg.dev/vertex-ai/training/tf-cpu.2-8:latest\"\n", + "PYTHON_MODULE=trainer.task\n", + "\n", + "echo > ./tuned_config.yaml \"workerPoolSpecs:\n", + " machineSpec:\n", + " machineType: n1-standard-8\n", + " replicaCount: 1\n", + " pythonPackageSpec:\n", + " executorImageUri: $PYTHON_PACKAGE_EXECUTOR_IMAGE_URI\n", + " packageUris: $PYTHON_PACKAGE_URI\n", + " pythonModule: $PYTHON_MODULE\n", + " args:\n", + " - --train_data_path=gs://${BUCKET}/babyweight/data/train*.csv\n", + " - --eval_data_path=gs://${BUCKET}/babyweight/data/eval*.csv\n", + " - --output_dir=$OUTDIR\n", + " - --num_epochs=10\n", + " - --train_examples=20000\n", + " - --eval_steps=100\n", + " - --batch_size=32\n", + " - --nembeds=8\"\n", + " \n", + "gcloud ai custom-jobs create \\\n", + " --region=${REGION} \\\n", + " --display-name=$JOB_NAME \\\n", + " --config=tuned_config.yaml" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Lab Summary: \n", + "In this lab, we set up the environment, created the trainer module's task.py to hold hyperparameter argparsing code, created the trainer module's model.py to hold Keras model code, ran the trainer module package locally, and submitted a training job to Vertex AI." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright 2021 Google LLC\n", + "Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "you may not use this file except in compliance with the License.\n", + "You may obtain a copy of the License at\n", + " https://www.apache.org/licenses/LICENSE-2.0\n", + "Unless required by applicable law or agreed to in writing, software\n", + "distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "See the License for the specific language governing permissions and\n", + "limitations under the License." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "environment": { + "kernel": "python3", + "name": "tf2-gpu.2-8.m91", + "type": "gcloud", + "uri": "gcr.io/deeplearning-platform-release/tf2-gpu.2-8:m91" + }, + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.12" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/notebooks/end-to-end-structured/solutions/5a_train_keras_vertex_babyweight_preprocessing_layers.ipynb b/notebooks/end-to-end-structured/solutions/5a_train_keras_vertex_babyweight_preprocessing_layers.ipynb new file mode 100644 index 00000000..99cd8bed --- /dev/null +++ b/notebooks/end-to-end-structured/solutions/5a_train_keras_vertex_babyweight_preprocessing_layers.ipynb @@ -0,0 +1,926 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# LAB 5a: Training Keras model on Vertex AI\n", + "\n", + "**Learning Objectives**\n", + "\n", + "1. Setup up the environment\n", + "1. Create trainer module's task.py to hold hyperparameter argparsing code\n", + "1. Create trainer module's model.py to hold Keras model code\n", + "1. Run trainer module package locally\n", + "1. Submit training job to Vertex AI\n", + "1. Submit hyperparameter tuning job to Vertex AI\n", + "\n", + "\n", + "## Introduction\n", + "After having testing our training pipeline both locally and in the cloud on a susbset of the data, we can submit another (much larger) training job to the cloud. It is also a good idea to run a hyperparameter tuning job to make sure we have optimized the hyperparameters of our model. \n", + "\n", + "In this notebook, we'll be training our Keras model at scale using Vertex AI.\n", + "\n", + "In this lab, we will set up the environment, create the trainer module's task.py to hold hyperparameter argparsing code, create the trainer module's model.py to hold Keras model code, run the trainer module package locally, submit a training job to Vertex AI, and submit a hyperparameter tuning job to Vertex AI." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "hJ7ByvoXzpVI" + }, + "source": [ + "## Set up environment variables and load necessary libraries" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "First we will install the `cloudml-hypertune` package on our local machine. This is the package which we will use to report hyperparameter tuning metrics to Vertex AI. Installing the package will allow us to test our trainer package locally." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "try:\n", + " import hypertune\n", + "\n", + "except ImportError:\n", + " !pip3 install -U cloudml-hypertune --user\n", + "\n", + " print(\"Please restart the kernel and re-run the notebook.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If the above command resulted in an installation, please restart the notebook kernel and re-run the notebook." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Import necessary libraries." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Set environment variables.\n", + "\n", + "Set environment variables so that we can use them throughout the entire lab. We will be using our project name for our bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "PROJECT = !gcloud config list --format 'value(core.project)'\n", + "PROJECT = PROJECT[0]\n", + "BUCKET = PROJECT\n", + "REGION = \"us-central1\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "os.environ[\"PROJECT\"] = PROJECT\n", + "os.environ[\"BUCKET\"] = BUCKET\n", + "os.environ[\"REGION\"] = REGION" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Create the bucket if does not exist, and confirm below that the bucket is regional and its region equals to the specified region:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + " %%bash\n", + "if ! gsutil ls | grep -q gs://${BUCKET}/; then\n", + " gsutil mb -l ${REGION} gs://${BUCKET}\n", + "fi\n", + "gsutil ls -Lb gs://$BUCKET | grep \"gs://\\|Location\"\n", + "echo $REGION" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%bash\n", + "gcloud config set project ${PROJECT}\n", + "gcloud config set ai/region ${REGION}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Check data exists\n", + "\n", + "Verify that you previously created CSV files we'll be using for training and evaluation. If not, go back to lab [1b_prepare_data_babyweight](../solutions/1b_prepare_data_babyweight.ipynb) to create them." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%bash\n", + "gsutil ls gs://${BUCKET}/babyweight/data/*000000000000.csv" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now that we have the [Keras wide-and-deep code](../solutions/4c_keras_wide_and_deep_babyweight.ipynb) working on a subset of the data, we can package the TensorFlow code up as a Python module and train it on Vertex AI.\n", + "\n", + "## Train on Vertex AI\n", + "\n", + "Training on Vertex AI requires:\n", + "* Making the code a Python source distribution\n", + "* Using gcloud to submit the training code to [Vertex AI](https://console.cloud.google.com/vertex-ai)\n", + "\n", + "Ensure that the Vertex AI API is enabled by going to this [link](https://console.developers.google.com/apis/library/aiplatform.googleapis.com).\n", + "\n", + "### Move code into a Python package\n", + "\n", + "A Python package is simply a collection of one or more `.py` files along with an `__init__.py` file to identify the containing directory as a package. The `__init__.py` sometimes contains initialization code but for our purposes an empty file suffices.\n", + "\n", + "The bash command `touch` creates an empty file in the specified location, the directory `babyweight` should already exist." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%bash\n", + "mkdir -p babyweight/trainer\n", + "touch babyweight/trainer/__init__.py" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We then use the `%%writefile` magic to write the contents of the cell below to a file called `task.py` in the `babyweight/trainer` folder." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create trainer module's task.py to hold hyperparameter argparsing code.\n", + "\n", + "The cell below writes the file `babyweight/trainer/task.py` which sets up our training job. Here is where we determine which parameters of our model to pass as flags during training using the `parser` module. Look at how `batch_size` is passed to the model in the code below. Use this as an example to parse arguements for the following variables\n", + "- `nnsize` which represents the hidden layer sizes to use for DNN feature columns\n", + "- `nembeds` which represents the embedding size of a cross of n key real-valued parameters\n", + "- `train_examples` which represents the number of examples (in thousands) to run the training job\n", + "- `eval_steps` which represents the positive number of steps for which to evaluate model\n", + "\n", + "Be sure to include a default value for the parsed arguments above and specfy the `type` if necessary." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile babyweight/trainer/task.py\n", + "import argparse\n", + "import json\n", + "import os\n", + "\n", + "from trainer import model\n", + "\n", + "import tensorflow as tf\n", + "\n", + "if __name__ == \"__main__\":\n", + " parser = argparse.ArgumentParser()\n", + " parser.add_argument(\n", + " \"--train_data_path\",\n", + " help=\"GCS location of training data\",\n", + " required=True\n", + " )\n", + " parser.add_argument(\n", + " \"--eval_data_path\",\n", + " help=\"GCS location of evaluation data\",\n", + " required=True\n", + " )\n", + " parser.add_argument(\n", + " \"--output_dir\",\n", + " help=\"GCS location to write checkpoints and export models\",\n", + " default = os.getenv(\"AIP_MODEL_DIR\")\n", + " )\n", + " parser.add_argument(\n", + " \"--batch_size\",\n", + " help=\"Number of examples to compute gradient over.\",\n", + " type=int,\n", + " default=512\n", + " )\n", + " parser.add_argument(\n", + " \"--nnsize\",\n", + " help=\"Hidden layer sizes for DNN -- provide space-separated layers\",\n", + " default=\"128 32 4\"\n", + " )\n", + " parser.add_argument(\n", + " \"--nembeds\",\n", + " help=\"Embedding size of a cross of n key real-valued parameters\",\n", + " type=int,\n", + " default=3\n", + " )\n", + " parser.add_argument(\n", + " \"--num_epochs\",\n", + " help=\"Number of epochs to train the model.\",\n", + " type=int,\n", + " default=10\n", + " )\n", + " parser.add_argument(\n", + " \"--train_examples\",\n", + " help=\"\"\"Number of examples (in thousands) to run the training job over.\n", + " If this is more than actual # of examples available, it cycles through\n", + " them. So specifying 1000 here when you have only 100k examples makes\n", + " this 10 epochs.\"\"\",\n", + " type=int,\n", + " default=5000\n", + " )\n", + " parser.add_argument(\n", + " \"--eval_steps\",\n", + " help=\"\"\"Positive number of steps for which to evaluate model. Default\n", + " to None, which means to evaluate until input_fn raises an end-of-input\n", + " exception\"\"\",\n", + " type=int,\n", + " default=None\n", + " )\n", + "\n", + " # Parse all arguments\n", + " args = parser.parse_args()\n", + " arguments = args.__dict__\n", + "\n", + " # Modify some arguments\n", + " arguments[\"train_examples\"] *= 1000\n", + "\n", + " # Run the training job\n", + " model.train_and_evaluate(arguments)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In the same way we can write to the file `model.py` the model that we developed in the previous notebooks. \n", + "\n", + "### Create trainer module's model.py to hold Keras model code.\n", + "\n", + "To create our `model.py`, we'll use the code we wrote for the Wide & Deep model. Look back at your [9_keras_wide_and_deep_babyweight](../solutions/9_keras_wide_and_deep_babyweight.ipynb) notebook and copy/paste the necessary code from that notebook into its place in the cell below." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile babyweight/trainer/model.py\n", + "import datetime\n", + "import os\n", + "import shutil\n", + "import numpy as np\n", + "import tensorflow as tf\n", + "import hypertune\n", + "\n", + "# Determine CSV, label, and key columns\n", + "CSV_COLUMNS = [\n", + " \"weight_pounds\",\n", + " \"is_male\",\n", + " \"mother_age\",\n", + " \"plurality\",\n", + " \"gestation_weeks\",\n", + "]\n", + "LABEL_COLUMN = \"weight_pounds\"\n", + "\n", + "NUMERICAL_COLUMNS = [\"mother_age\", \"gestation_weeks\"]\n", + "CATEGORICAL_COLUMNS = [\"is_male\", \"plurality\"]\n", + "\n", + "# Set default values for each CSV column.\n", + "# Treat is_male and plurality as strings.\n", + "DEFAULTS = [[0.0], [\"null\"], [0.0], [\"null\"], [0.0]]\n", + "\n", + "\n", + "def features_and_labels(row_data):\n", + " \"\"\"Splits features and labels from feature dictionary.\n", + "\n", + " Args:\n", + " row_data: Dictionary of CSV column names and tensor values.\n", + " Returns:\n", + " Dictionary of feature tensors and label tensor.\n", + " \"\"\"\n", + " label = row_data.pop(LABEL_COLUMN)\n", + "\n", + " return row_data, label # features, label\n", + "\n", + "\n", + "def load_dataset(pattern, batch_size=1, mode=tf.estimator.ModeKeys.EVAL):\n", + " \"\"\"Loads dataset using the tf.data API from CSV files.\n", + "\n", + " Args:\n", + " pattern: str, file pattern to glob into list of files.\n", + " batch_size: int, the number of examples per batch.\n", + " mode: tf.estimator.ModeKeys to determine if training or evaluating.\n", + " Returns:\n", + " `Dataset` object.\n", + " \"\"\"\n", + " # Make a CSV dataset\n", + " dataset = tf.data.experimental.make_csv_dataset(\n", + " file_pattern=pattern,\n", + " batch_size=batch_size,\n", + " column_names=CSV_COLUMNS,\n", + " column_defaults=DEFAULTS,\n", + " )\n", + "\n", + " # Map dataset to features and label\n", + " dataset = dataset.map(map_func=features_and_labels) # features, label\n", + "\n", + " # Shuffle and repeat for training\n", + " if mode == tf.estimator.ModeKeys.TRAIN:\n", + " dataset = dataset.shuffle(buffer_size=1000).repeat()\n", + "\n", + " # Take advantage of multi-threading; 1=AUTOTUNE\n", + " dataset = dataset.prefetch(buffer_size=1)\n", + "\n", + " return dataset\n", + "\n", + "\n", + "def create_input_layers():\n", + " \"\"\"Creates dictionary of input layers for each feature.\n", + "\n", + " Returns:\n", + " Dictionary of `tf.Keras.layers.Input` layers for each feature.\n", + " \"\"\"\n", + " deep_inputs = {\n", + " colname: tf.keras.layers.Input(\n", + " name=colname, shape=(1,), dtype=\"float32\"\n", + " )\n", + " for colname in NUMERICAL_COLUMNS\n", + " }\n", + "\n", + " wide_inputs = {\n", + " colname: tf.keras.layers.Input(name=colname, shape=(1,), dtype=\"string\")\n", + " for colname in CATEGORICAL_COLUMNS\n", + " }\n", + "\n", + " inputs = {**wide_inputs, **deep_inputs}\n", + "\n", + " return inputs\n", + "\n", + "\n", + "def transform(inputs, nembeds):\n", + " \"\"\"Creates dictionary of transformed inputs.\n", + "\n", + " Returns:\n", + " Dictionary of transformed Tensors\n", + " \"\"\"\n", + "\n", + " deep = {}\n", + " wide = {}\n", + "\n", + " buckets = {\n", + " \"mother_age\": np.arange(15, 45, 1).tolist(),\n", + " \"gestation_weeks\": np.arange(17, 47, 1).tolist(),\n", + " }\n", + " bucketized = {}\n", + "\n", + " for nc in NUMERICAL_COLUMNS:\n", + " deep[nc] = inputs[nc]\n", + " bucketized[nc] = tf.keras.layers.Discretization(buckets[nc])(inputs[nc])\n", + " wide[f\"btk_{nc}\"] = tf.keras.layers.CategoryEncoding(\n", + " num_tokens=len(buckets[nc]) + 1, output_mode=\"one_hot\"\n", + " )(bucketized[nc])\n", + "\n", + " crossed = tf.keras.layers.experimental.preprocessing.HashedCrossing(\n", + " num_bins=len(buckets[\"mother_age\"]) * len(buckets[\"gestation_weeks\"])\n", + " )((bucketized[\"mother_age\"], bucketized[\"gestation_weeks\"]))\n", + "\n", + " deep[\"age_gestation_embeds\"] = tf.keras.layers.Flatten()(\n", + " tf.keras.layers.Embedding(\n", + " input_dim=len(buckets[\"mother_age\"])\n", + " * len(buckets[\"gestation_weeks\"]),\n", + " output_dim=nembeds,\n", + " )(crossed)\n", + " )\n", + "\n", + " vocab = {\n", + " \"is_male\": [\"True\", \"False\", \"Unknown\"],\n", + " \"plurality\": [\n", + " \"Single(1)\",\n", + " \"Twins(2)\",\n", + " \"Triplets(3)\",\n", + " \"Quadruplets(4)\",\n", + " \"Quintuplets(5)\",\n", + " \"Multiple(2+)\",\n", + " ],\n", + " }\n", + "\n", + " for cc in CATEGORICAL_COLUMNS:\n", + " wide[cc] = tf.keras.layers.StringLookup(\n", + " vocabulary=vocab[cc], output_mode=\"one_hot\"\n", + " )(inputs[cc])\n", + "\n", + " return wide, deep\n", + "\n", + "def get_model_outputs(wide_inputs, deep_inputs, dnn_hidden_units):\n", + " \"\"\"Creates model architecture and returns outputs.\n", + "\n", + " Args:\n", + " wide_inputs: Dense tensor used as inputs to wide side of model.\n", + " deep_inputs: Dense tensor used as inputs to deep side of model.\n", + " dnn_hidden_units: List of integers where length is number of hidden\n", + " layers and ith element is the number of neurons at ith layer.\n", + " Returns:\n", + " Dense tensor output from the model.\n", + " \"\"\"\n", + " # Hidden layers for the deep side\n", + " layers = [int(x) for x in dnn_hidden_units.split()]\n", + " deep = deep_inputs\n", + " for layerno, numnodes in enumerate(layers):\n", + " deep = tf.keras.layers.Dense(\n", + " units=numnodes, activation=\"relu\", name=f\"dnn_{layerno + 1}\"\n", + " )(deep)\n", + " deep_out = deep\n", + "\n", + " # Linear model for the wide side\n", + " wide_out = tf.keras.layers.Dense(\n", + " units=10, activation=\"relu\", name=\"linear\"\n", + " )(wide_inputs)\n", + "\n", + " # Concatenate the two sides\n", + " both = tf.keras.layers.Concatenate(name=\"both\")([deep_out, wide_out])\n", + "\n", + " # Final output is a linear activation because this is regression\n", + " output = tf.keras.layers.Dense(units=1, activation=\"linear\", name=\"weight\")(\n", + " both\n", + " )\n", + "\n", + " return output\n", + "\n", + "\n", + "def rmse(y_true, y_pred):\n", + " \"\"\"Calculates RMSE evaluation metric.\n", + "\n", + " Args:\n", + " y_true: tensor, true labels.\n", + " y_pred: tensor, predicted labels.\n", + " Returns:\n", + " Tensor with value of RMSE between true and predicted labels.\n", + " \"\"\"\n", + " return tf.sqrt(tf.reduce_mean(tf.square(y_pred - y_true)))\n", + "\n", + "\n", + "def build_wide_deep_model(dnn_hidden_units=[64, 32], nembeds=3):\n", + " \"\"\"Builds wide and deep model using Keras Functional API.\n", + "\n", + " Returns:\n", + " `tf.keras.models.Model` object.\n", + " \"\"\"\n", + " # Create input layers\n", + " inputs = create_input_layers()\n", + "\n", + " # transform raw features for both wide and deep\n", + " wide, deep = transform(inputs, nembeds)\n", + "\n", + " # The Functional API in Keras requires: LayerConstructor()(inputs)\n", + " wide_inputs = tf.keras.layers.Concatenate()(wide.values())\n", + " deep_inputs = tf.keras.layers.Concatenate()(deep.values())\n", + "\n", + " # Get output of model given inputs\n", + " output = get_model_outputs(wide_inputs, deep_inputs, dnn_hidden_units)\n", + "\n", + " # Build model and compile it all together\n", + " model = tf.keras.models.Model(inputs=inputs, outputs=output)\n", + " model.compile(optimizer=\"adam\", loss=\"mse\", metrics=[rmse, \"mse\"])\n", + "\n", + " return model\n", + "\n", + "\n", + "# Instantiate the HyperTune reporting object\n", + "hpt = hypertune.HyperTune()\n", + "\n", + "# Reporting callback\n", + "class HPTCallback(tf.keras.callbacks.Callback):\n", + "\n", + " def on_epoch_end(self, epoch, logs=None):\n", + " global hpt\n", + " hpt.report_hyperparameter_tuning_metric(\n", + " hyperparameter_metric_tag='val_rmse',\n", + " metric_value=logs['val_rmse'],\n", + " global_step=epoch)\n", + "\n", + " \n", + "def train_and_evaluate(args):\n", + " model = build_wide_deep_model(args[\"nnsize\"], args[\"nembeds\"])\n", + " print(\"Here is our Wide-and-Deep architecture so far:\\n\")\n", + " print(model.summary())\n", + "\n", + " trainds = load_dataset(\n", + " args[\"train_data_path\"],\n", + " args[\"batch_size\"],\n", + " tf.estimator.ModeKeys.TRAIN)\n", + "\n", + " evalds = load_dataset(\n", + " args[\"eval_data_path\"], 1000, tf.estimator.ModeKeys.EVAL)\n", + " if args[\"eval_steps\"]:\n", + " evalds = evalds.take(count=args[\"eval_steps\"])\n", + "\n", + " num_batches = args[\"batch_size\"] * args[\"num_epochs\"]\n", + " steps_per_epoch = args[\"train_examples\"] // num_batches\n", + "\n", + " checkpoint_path = os.path.join(args[\"output_dir\"], \"checkpoints/babyweight\")\n", + " cp_callback = tf.keras.callbacks.ModelCheckpoint(\n", + " filepath=checkpoint_path, verbose=1, save_weights_only=True)\n", + "\n", + " history = model.fit(\n", + " trainds,\n", + " validation_data=evalds,\n", + " epochs=args[\"num_epochs\"],\n", + " steps_per_epoch=steps_per_epoch,\n", + " verbose=2, # 0=silent, 1=progress bar, 2=one line per epoch\n", + " callbacks=[cp_callback, HPTCallback()])\n", + "\n", + " EXPORT_PATH = os.path.join(\n", + " args[\"output_dir\"], datetime.datetime.now().strftime(\"%Y%m%d%H%M%S\"))\n", + " tf.saved_model.save(\n", + " obj=model, export_dir=EXPORT_PATH) # with default serving function\n", + " \n", + " print(\"Exported trained model to {}\".format(EXPORT_PATH))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Train locally\n", + "\n", + "After moving the code to a package, make sure it works as a standalone. Note, we incorporated the `--train_examples` flag so that we don't try to train on the entire dataset while we are developing our pipeline. Once we are sure that everything is working on a subset, we can change it so that we can train on all the data. Even for this subset, this takes about *3 minutes* in which you won't see any output ..." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Run trainer module package locally.\n", + "\n", + "We can run a very small training job over a single file with a small batch size, 1 epoch, 1 train example, and 1 eval step." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%bash\n", + "OUTDIR=babyweight_trained\n", + "rm -rf ${OUTDIR}\n", + "export PYTHONPATH=${PYTHONPATH}:${PWD}/babyweight\n", + "python3 -m trainer.task \\\n", + " --train_data_path=gs://${BUCKET}/babyweight/data/train*.csv \\\n", + " --eval_data_path=gs://${BUCKET}/babyweight/data/eval*.csv \\\n", + " --output_dir=${OUTDIR} \\\n", + " --batch_size=10 \\\n", + " --num_epochs=1 \\\n", + " --train_examples=1 \\\n", + " --eval_steps=1" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Training on Vertex AI\n", + "\n", + "Now that we see everything is working locally, it's time to train on the cloud! First, we need to package our code as a source distribution. For this, we can use `setuptools`. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile babyweight/setup.py\n", + "from setuptools import find_packages\n", + "from setuptools import setup\n", + "\n", + "setup(\n", + " name='babyweight_trainer',\n", + " version='0.1',\n", + " packages=find_packages(),\n", + " include_package_data=True,\n", + " description='Babyweight model training application.'\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%bash\n", + "cd babyweight\n", + "python ./setup.py sdist --formats=gztar\n", + "cd .." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We will store our package in the Cloud Storage bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%bash\n", + "gsutil cp babyweight/dist/babyweight_trainer-0.1.tar.gz gs://${BUCKET}/babyweight/" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To submit to the Cloud we use [`gcloud custom-jobs create`](https://cloud.google.com/sdk/gcloud/reference/ai/custom-jobs/create) and simply specify some additional parameters for the Vertex AI Training Service:\n", + "- display-name: A unique identifier for the Cloud job. We usually append system time to ensure uniqueness\n", + "- region: Cloud region to train in. See [here](https://cloud.google.com/vertex-ai/docs/general/locations) for supported Vertex AI Training Service regions\n", + "\n", + "You might have earlier seen `gcloud ai custom-jobs create` executed with the `worker pool spec` and pass-through Python arguments specified directly in the command call, here we will use a YAML file, this will make it easier to transition to hyperparameter tuning.\n", + "\n", + "Through the `args:` argument we add in the passed-through arguments for our `task.py` file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%bash\n", + "\n", + "TIMESTAMP=$(date -u +%Y%m%d_%H%M%S)\n", + "OUTDIR=gs://${BUCKET}/babyweight/trained_model_$TIMESTAMP\n", + "JOB_NAME=babyweight_$TIMESTAMP\n", + "\n", + "PYTHON_PACKAGE_URI=gs://${BUCKET}/babyweight/babyweight_trainer-0.1.tar.gz\n", + "PYTHON_PACKAGE_EXECUTOR_IMAGE_URI=\"us-docker.pkg.dev/vertex-ai/training/tf-cpu.2-8:latest\"\n", + "PYTHON_MODULE=trainer.task\n", + "\n", + "echo > ./config.yaml \"workerPoolSpecs:\n", + " machineSpec:\n", + " machineType: n1-standard-4\n", + " replicaCount: 1\n", + " pythonPackageSpec:\n", + " executorImageUri: $PYTHON_PACKAGE_EXECUTOR_IMAGE_URI\n", + " packageUris: $PYTHON_PACKAGE_URI\n", + " pythonModule: $PYTHON_MODULE\n", + " args:\n", + " - --train_data_path=gs://${BUCKET}/babyweight/data/train*.csv\n", + " - --eval_data_path=gs://${BUCKET}/babyweight/data/eval*.csv\n", + " - --output_dir=$OUTDIR\n", + " - --num_epochs=10\n", + " - --train_examples=10000\n", + " - --eval_steps=100\n", + " - --batch_size=32\n", + " - --nembeds=8\"\n", + "\n", + "gcloud ai custom-jobs create \\\n", + " --region=${REGION} \\\n", + " --display-name=$JOB_NAME \\\n", + " --config=config.yaml" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The training job should complete within 10 to 15 minutes. You will need a trained model to complete our next lab." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Hyperparameter tuning\n", + "\n", + "To do hyperparameter tuning, create a YAML file and and pass its name with `--config`.\n", + "This step could take hours -- you can increase `--parallel-trial-count` or reduce `--max-trial-count` to get it done faster. Since `--parallel-trial-count` is the number of initial seeds to start searching from, you don't want it to be too large; otherwise, all you have is a random search." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%bash\n", + "TIMESTAMP=$(date -u +%Y%m%d_%H%M%S)\n", + "BASE_OUTPUT_DIR=gs://${BUCKET}/babyweight/hp_tuning_$TIMESTAMP\n", + "JOB_NAME=babyweight_hpt_$TIMESTAMP\n", + "\n", + "PYTHON_PACKAGE_URI=gs://${BUCKET}/babyweight/babyweight_trainer-0.1.tar.gz\n", + "PYTHON_PACKAGE_EXECUTOR_IMAGE_URI=\"us-docker.pkg.dev/vertex-ai/training/tf-cpu.2-8:latest\"\n", + "PYTHON_MODULE=trainer.task\n", + "\n", + "echo > ./hyperparam.yaml \"displayName: $JOB_NAME\n", + "studySpec:\n", + " metrics:\n", + " - metricId: val_rmse\n", + " goal: MINIMIZE\n", + " parameters:\n", + " - parameterId: batch_size\n", + " integerValueSpec:\n", + " minValue: 8\n", + " maxValue: 512\n", + " scaleType: UNIT_LOG_SCALE\n", + " - parameterId: nembeds\n", + " integerValueSpec:\n", + " minValue: 3\n", + " maxValue: 30\n", + " scaleType: UNIT_LINEAR_SCALE\n", + " algorithm: ALGORITHM_UNSPECIFIED # results in Bayesian optimization\n", + "trialJobSpec:\n", + " baseOutputDirectory:\n", + " outputUriPrefix: $BASE_OUTPUT_DIR\n", + " workerPoolSpecs:\n", + " - machineSpec:\n", + " machineType: n1-standard-8\n", + " pythonPackageSpec:\n", + " executorImageUri: $PYTHON_PACKAGE_EXECUTOR_IMAGE_URI\n", + " packageUris:\n", + " - $PYTHON_PACKAGE_URI\n", + " pythonModule: $PYTHON_MODULE\n", + " args:\n", + " - --train_data_path=gs://${BUCKET}/babyweight/data/train*.csv\n", + " - --eval_data_path=gs://${BUCKET}/babyweight/data/eval*.csv\n", + " - --num_epochs=10\n", + " - --train_examples=5000\n", + " - --eval_steps=100\n", + " - --batch_size=32\n", + " - --nembeds=8\n", + " replicaCount: 1\"\n", + " \n", + "gcloud ai hp-tuning-jobs create \\\n", + " --region=$REGION \\\n", + " --display-name=$JOB_NAME \\\n", + " --config=hyperparam.yaml \\\n", + " --max-trial-count=20 \\\n", + " --parallel-trial-count=5" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Repeat training\n", + "\n", + "This time with tuned parameters for `batch_size` and `nembeds`. Note that your best results may differ from below. So be sure to fill yours in!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%bash\n", + "TIMESTAMP=$(date -u +%Y%m%d_%H%M%S)\n", + "OUTDIR=gs://${BUCKET}/babyweight/tuned_$TIMESTAMP\n", + "JOB_NAME=babyweight_tuned_$TIMESTAMP\n", + "\n", + "PYTHON_PACKAGE_URI=gs://${BUCKET}/babyweight/babyweight_trainer-0.1.tar.gz\n", + "PYTHON_PACKAGE_EXECUTOR_IMAGE_URI=\"us-docker.pkg.dev/vertex-ai/training/tf-cpu.2-8:latest\"\n", + "PYTHON_MODULE=trainer.task\n", + "\n", + "echo > ./tuned_config.yaml \"workerPoolSpecs:\n", + " machineSpec:\n", + " machineType: n1-standard-8\n", + " replicaCount: 1\n", + " pythonPackageSpec:\n", + " executorImageUri: $PYTHON_PACKAGE_EXECUTOR_IMAGE_URI\n", + " packageUris: $PYTHON_PACKAGE_URI\n", + " pythonModule: $PYTHON_MODULE\n", + " args:\n", + " - --train_data_path=gs://${BUCKET}/babyweight/data/train*.csv\n", + " - --eval_data_path=gs://${BUCKET}/babyweight/data/eval*.csv\n", + " - --output_dir=$OUTDIR\n", + " - --num_epochs=10\n", + " - --train_examples=20000\n", + " - --eval_steps=100\n", + " - --batch_size=32\n", + " - --nembeds=8\"\n", + " \n", + "gcloud ai custom-jobs create \\\n", + " --region=${REGION} \\\n", + " --display-name=$JOB_NAME \\\n", + " --config=tuned_config.yaml" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Lab Summary: \n", + "In this lab, we set up the environment, created the trainer module's task.py to hold hyperparameter argparsing code, created the trainer module's model.py to hold Keras model code, ran the trainer module package locally, and submitted a training job to Vertex AI." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright 2021 Google LLC\n", + "Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "you may not use this file except in compliance with the License.\n", + "You may obtain a copy of the License at\n", + " https://www.apache.org/licenses/LICENSE-2.0\n", + "Unless required by applicable law or agreed to in writing, software\n", + "distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "See the License for the specific language governing permissions and\n", + "limitations under the License." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "environment": { + "kernel": "python3", + "name": "tf2-gpu.2-8.m91", + "type": "gcloud", + "uri": "gcr.io/deeplearning-platform-release/tf2-gpu.2-8:m91" + }, + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.12" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} From 1788f605c91de402fceb441d2178c30774f52f66 Mon Sep 17 00:00:00 2001 From: Takumi Ohyama Date: Fri, 15 Apr 2022 07:40:16 +0000 Subject: [PATCH 3/8] renamed 5b old labs --- ...nb => 5b_deploy_keras_vertex_babyweight_feature_columns.ipynb} | 0 ...nb => 5b_deploy_keras_vertex_babyweight_feature_columns.ipynb} | 0 2 files changed, 0 insertions(+), 0 deletions(-) rename notebooks/end-to-end-structured/labs/{5b_deploy_keras_ai_platform_babyweight_vertex.ipynb => 5b_deploy_keras_vertex_babyweight_feature_columns.ipynb} (100%) rename notebooks/end-to-end-structured/solutions/{5b_deploy_keras_ai_platform_babyweight_vertex.ipynb => 5b_deploy_keras_vertex_babyweight_feature_columns.ipynb} (100%) diff --git a/notebooks/end-to-end-structured/labs/5b_deploy_keras_ai_platform_babyweight_vertex.ipynb b/notebooks/end-to-end-structured/labs/5b_deploy_keras_vertex_babyweight_feature_columns.ipynb similarity index 100% rename from notebooks/end-to-end-structured/labs/5b_deploy_keras_ai_platform_babyweight_vertex.ipynb rename to notebooks/end-to-end-structured/labs/5b_deploy_keras_vertex_babyweight_feature_columns.ipynb diff --git a/notebooks/end-to-end-structured/solutions/5b_deploy_keras_ai_platform_babyweight_vertex.ipynb b/notebooks/end-to-end-structured/solutions/5b_deploy_keras_vertex_babyweight_feature_columns.ipynb similarity index 100% rename from notebooks/end-to-end-structured/solutions/5b_deploy_keras_ai_platform_babyweight_vertex.ipynb rename to notebooks/end-to-end-structured/solutions/5b_deploy_keras_vertex_babyweight_feature_columns.ipynb From 3027a4429b91c5e47beca1f69c69e980664ba74a Mon Sep 17 00:00:00 2001 From: Takumi Ohyama Date: Fri, 15 Apr 2022 07:47:34 +0000 Subject: [PATCH 4/8] added babyweights vertex deploy labs preprocessing version --- ...rtex_babyweight_preprocessing_layers.ipynb | 456 +++++++++++++++++ ...rtex_babyweight_preprocessing_layers.ipynb | 459 ++++++++++++++++++ 2 files changed, 915 insertions(+) create mode 100644 notebooks/end-to-end-structured/labs/5b_deploy_keras_vertex_babyweight_preprocessing_layers.ipynb create mode 100644 notebooks/end-to-end-structured/solutions/5b_deploy_keras_vertex_babyweight_preprocessing_layers.ipynb diff --git a/notebooks/end-to-end-structured/labs/5b_deploy_keras_vertex_babyweight_preprocessing_layers.ipynb b/notebooks/end-to-end-structured/labs/5b_deploy_keras_vertex_babyweight_preprocessing_layers.ipynb new file mode 100644 index 00000000..db1d5896 --- /dev/null +++ b/notebooks/end-to-end-structured/labs/5b_deploy_keras_vertex_babyweight_preprocessing_layers.ipynb @@ -0,0 +1,456 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# LAB 5b: Deploy and predict with Keras model on Vertex AI\n", + "\n", + "**Learning Objectives**\n", + "\n", + "1. Setup up the environment\n", + "1. Deploy trained Keras model to an endpoint for online prediction on Vertex AI\n", + "1. Online predict from model on Vertex AI\n", + "1. Batch predict from model on Vertex AI\n", + "\n", + "## Introduction \n", + "In this notebook, we'll be deploying our Keras model to Vertex AI and creating predictions.\n", + "\n", + "We will set up the environment, deploy a trained Keras model to Vertex AI for online prediction, online predict from deployed model on Vertex AI, and batch predict on Vertex AI.\n", + "\n", + "Each learning objective will correspond to a __#TODO__ in this student lab notebook -- try to complete this notebook first and then review the [solution notebook](../solutions/5b_deploy_keras_ai_platform_babyweight.ipynb)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "hJ7ByvoXzpVI" + }, + "source": [ + "## Set up environment variables and load necessary libraries" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Import necessary libraries." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "\n", + "from google.cloud import aiplatform\n", + "from google.protobuf import json_format\n", + "from google.protobuf.struct_pb2 import Value" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Set environment variables.\n", + "\n", + "Set environment variables so that we can use them throughout the entire lab. We will be using our project name for our bucket, so you only need to change your project and region." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%bash\n", + "PROJECT=$(gcloud config list project --format \"value(core.project)\")\n", + "echo \"Your current GCP Project Name is: \"$PROJECT" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "PROJECT = !gcloud config list --format 'value(core.project)'\n", + "PROJECT = PROJECT[0]\n", + "BUCKET = PROJECT # defaults to PROJECT\n", + "REGION = \"us-central1\" # Replace with your REGION" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "os.environ[\"PROJECT\"] = PROJECT\n", + "os.environ[\"BUCKET\"] = BUCKET\n", + "os.environ[\"REGION\"] = REGION" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%bash\n", + "gcloud config set project $PROJECT\n", + "gcloud config set ai/region $REGION" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Check our trained model files\n", + "\n", + "Let's check the directory structure of our outputs of our trained model in folder we exported the model to in our last [lab](../solutions/10_train_keras_ai_platform_babyweight.ipynb). We'll want to deploy the saved_model.pb within the directory of the tuned model as well as the variable values in the variables folder. Therefore, we need the path of the latest tuned directory so that everything within it can be found by Vertex AI's model deployment service. Note that the `2*` substrings are there to match timestamp strings." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%bash\n", + "gsutil ls gs://${BUCKET}/babyweight/tuned_2*" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%bash\n", + "MODEL_LOCATION=$(gsutil ls -d -- gs://${BUCKET}/babyweight/tuned_2*/2* \\\n", + " | tail -1)\n", + "gsutil ls ${MODEL_LOCATION}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Upload model, create endpoint and deploy trained model\n", + "\n", + "Uploading our SavedModel from the above `MODEL_LOCATION`, creating and endpoint and deploying the trained model to act as a REST web service are three simple gcloud calls. We also run a command to list the endpoints, to fetch the fully qualified resource name `ENDPOINT_RESOURCENAME` for the endpoint." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%bash\n", + "TIMESTAMP=$(date -u +%Y%m%d_%H%M%S)\n", + "MODEL_DISPLAYNAME=babyweight_model_$TIMESTAMP\n", + "ENDPOINT_DISPLAYNAME=babyweight_endpoint_$TIMESTAMP\n", + "IMAGE_URI=\"us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-3:latest\"\n", + "MODEL_LOCATION=$(gsutil ls -d -- gs://${BUCKET}/babyweight/tuned_2*/2* \\\n", + " | tail -1)\n", + "echo \"MODEL_LOCATION=${MODEL_LOCATION}\"\n", + "\n", + "# Model\n", + "MODEL_RESOURCENAME=$(gcloud ai models upload \\\n", + " --region=$REGION \\\n", + " --display-name=$MODEL_DISPLAYNAME \\\n", + " --container-image-uri=$IMAGE_URI \\\n", + " --artifact-uri=$MODEL_LOCATION \\\n", + " --format=\"value(model)\")\n", + "\n", + "MODEL_ID=$(echo $MODEL_RESOURCENAME | cut -d\"/\" -f6)\n", + "\n", + "echo \"MODEL_DISPLAYNAME=${MODEL_DISPLAYNAME}\"\n", + "echo \"MODEL_RESOURCENAME=${MODEL_RESOURCENAME}\"\n", + "echo \"MODEL_ID=${MODEL_ID}\"\n", + "\n", + "# Endpoint\n", + "ENDPOINT_RESOURCENAME=$(gcloud ai endpoints create \\\n", + " --region=$REGION \\\n", + " --display-name=$ENDPOINT_DISPLAYNAME \\\n", + " --format=\"value(name)\")\n", + "\n", + "ENDPOINT_ID=$(echo $ENDPOINT_RESOURCENAME | cut -d\"/\" -f6)\n", + "\n", + "echo \"ENDPOINT_DISPLAYNAME=${ENDPOINT_DISPLAYNAME}\"\n", + "echo \"ENDPOINT_RESOURCENAME=${ENDPOINT_RESOURCENAME}\"\n", + "echo \"ENDPOINT_ID=${ENDPOINT_ID}\"\n", + "\n", + "# Deployment\n", + "DEPLOYEDMODEL_DISPLAYNAME=${MODEL_DISPLAYNAME}_deployment\n", + "MACHINE_TYPE=n1-standard-2\n", + "MIN_REPLICA_COUNT=1\n", + "MAX_REPLICA_COUNT=3\n", + "\n", + "gcloud ai endpoints deploy-model $ENDPOINT_RESOURCENAME \\\n", + " --region=$REGION \\\n", + " --model=$MODEL_RESOURCENAME \\\n", + " --display-name=$DEPLOYEDMODEL_DISPLAYNAME \\\n", + " --machine-type=$MACHINE_TYPE \\\n", + " --min-replica-count=$MIN_REPLICA_COUNT \\\n", + " --max-replica-count=$MAX_REPLICA_COUNT \\\n", + " --traffic-split=0=100" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Use model to make online prediction." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Python API\n", + "\n", + "We can use the Python API to send a JSON request to the endpoint of the service to make it predict a baby's weight. The order of the responses are the order of the instances." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# TODO: Copy your `ENDPOINT_RESOURCENAME` from above.\n", + "ENDPOINT_RESOURCENAME = \"\"\n", + "os.environ[\"ENDPOINT_RESOURCENAME\"] = ENDPOINT_RESOURCENAME\n", + "\n", + "api_endpoint = f\"{REGION}-aiplatform.googleapis.com\"\n", + "\n", + "# The AI Platform services require regional API endpoints.\n", + "client_options = {\"api_endpoint\": api_endpoint}\n", + "# Initialize client that will be used to create and send requests.\n", + "# This client only needs to be created once, and can be reused for multiple requests.\n", + "client = aiplatform.gapic.PredictionServiceClient(client_options=client_options)\n", + "\n", + "instances = [\n", + " {\n", + " \"is_male\": [\"True\"],\n", + " \"mother_age\": [26.0],\n", + " \"plurality\": [\"Single(1)\"],\n", + " \"gestation_weeks\": [39],\n", + " },\n", + " {\n", + " \"is_male\": [\"False\"],\n", + " \"mother_age\": [29.0],\n", + " \"plurality\": [\"Single(1)\"],\n", + " \"gestation_weeks\": [38],\n", + " },\n", + " {\n", + " \"is_male\": [\"True\"],\n", + " \"mother_age\": [26.0],\n", + " \"plurality\": [\"Triplets(3)\"],\n", + " \"gestation_weeks\": [39],\n", + " },\n", + " {\n", + " # TODO: Create another instance\n", + " },\n", + "]\n", + "\n", + "instances = [json_format.ParseDict(instance, Value()) for instance in instances]\n", + "response = client.predict(endpoint=ENDPOINT_RESOURCENAME, instances=instances)\n", + "\n", + "# The predictions are a google.protobuf.Value representation of the model's predictions.\n", + "print(\" prediction:\", response.predictions)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The predictions for the four instances were: 5.33, 6.09, 2.50, and 5.86 pounds respectively when I ran it (your results might be different)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### gcloud shell API\n", + "\n", + "Instead we could use the gcloud shell API. Create a newline delimited JSON file with one instance per line and submit using gcloud." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile inputs.json\n", + "{\n", + " \"instances\": [\n", + " {\"is_male\": [\"True\"], \"mother_age\": [26.0], \"plurality\": [\"Single(1)\"], \"gestation_weeks\": [39]},\n", + " {\"is_male\": [\"False\"], \"mother_age\": [26.0], \"plurality\": [\"Single(1)\"], \"gestation_weeks\": [39]}\n", + " ]\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now call `gcloud ai endpoint predict` using the JSON we just created and point to our deployed `ENDPOINT_RESOURCENAME`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%bash\n", + "gcloud ai endpoints predict $ENDPOINT_RESOURCENAME \\\n", + " --region=$REGION \\\n", + " --json-request=inputs.json" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Use model to make batch prediction.\n", + "\n", + "Batch prediction is commonly used when you have thousands to millions of predictions. It will create a Vertex AI batch prediction job. We will put our prediction request JSONL file (multiple lines of JSON records) to GCS, and use the Python API to request the job." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile inputs.jsonl\n", + "{\"is_male\": [\"True\"], \"mother_age\": [26.0], \"plurality\": [\"Single(1)\"], \"gestation_weeks\": [39]}\n", + "{\"is_male\": [\"False\"], \"mother_age\": [26.0], \"plurality\": [\"Single(1)\"], \"gestation_weeks\": [39]}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!gsutil cp inputs.jsonl gs://$BUCKET/babyweight/batchpred/inputs.jsonl" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# TODO: replace with your MODEL_RESOURCENAME from above\n", + "MODEL_RESOURCENAME = \"\"\n", + "\n", + "aiplatform.init(project=PROJECT, location=REGION)\n", + "\n", + "my_model = aiplatform.Model(MODEL_RESOURCENAME)\n", + "\n", + "batch_prediction_job = my_model.batch_predict(\n", + " job_display_name=\"babyweight_batch\",\n", + " gcs_source=f\"gs://{BUCKET}/babyweight/batchpred/inputs.jsonl\",\n", + " gcs_destination_prefix=f\"gs://{BUCKET}/babyweight/batchpred/outputs\",\n", + " machine_type=\"n1-standard-2\",\n", + " accelerator_count=0,\n", + " starting_replica_count=1,\n", + " max_replica_count=1,\n", + ")\n", + "\n", + "batch_prediction_job.wait()\n", + "\n", + "print(batch_prediction_job.display_name)\n", + "print(batch_prediction_job.resource_name)\n", + "print(batch_prediction_job.state)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!gsutil cat $(gsutil ls gs://$BUCKET/babyweight/batchpred/outputs | tail -n1)prediction.errors_stats-*" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!gsutil cat $(gsutil ls gs://$BUCKET/babyweight/batchpred/outputs | tail -n1)prediction.results-*" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Lab Summary:\n", + "In this lab, we set up the environment, deployed a trained Keras model to Vertex AI, online predicted from deployed model, and batch predicted from deployed model on Vertex AI." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright 2021 Google LLC\n", + "Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "you may not use this file except in compliance with the License.\n", + "You may obtain a copy of the License at\n", + " https://www.apache.org/licenses/LICENSE-2.0\n", + "Unless required by applicable law or agreed to in writing, software\n", + "distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "See the License for the specific language governing permissions and\n", + "limitations under the License." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "environment": { + "kernel": "python3", + "name": "tf2-gpu.2-8.m91", + "type": "gcloud", + "uri": "gcr.io/deeplearning-platform-release/tf2-gpu.2-8:m91" + }, + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.12" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/notebooks/end-to-end-structured/solutions/5b_deploy_keras_vertex_babyweight_preprocessing_layers.ipynb b/notebooks/end-to-end-structured/solutions/5b_deploy_keras_vertex_babyweight_preprocessing_layers.ipynb new file mode 100644 index 00000000..35798ad1 --- /dev/null +++ b/notebooks/end-to-end-structured/solutions/5b_deploy_keras_vertex_babyweight_preprocessing_layers.ipynb @@ -0,0 +1,459 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# LAB 5b: Deploy and predict with Keras model on Vertex AI\n", + "\n", + "**Learning Objectives**\n", + "\n", + "1. Setup up the environment\n", + "1. Deploy trained Keras model to an endpoint for online prediction on Vertex AI\n", + "1. Online predict from model on Vertex AI\n", + "1. Batch predict from model on Vertex AI\n", + "\n", + "## Introduction \n", + "In this notebook, we'll be deploying our Keras model to Vertex AI and creating predictions.\n", + "\n", + "We will set up the environment, deploy a trained Keras model to Vertex AI for online prediction, online predict from deployed model on Vertex AI, and batch predict on Vertex AI.\n", + "\n", + "Each learning objective will correspond to a __#TODO__ in this student lab notebook -- try to complete this notebook first and then review the [solution notebook](../solutions/5b_deploy_keras_ai_platform_babyweight.ipynb)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "hJ7ByvoXzpVI" + }, + "source": [ + "## Set up environment variables and load necessary libraries" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Import necessary libraries." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "\n", + "from google.cloud import aiplatform\n", + "from google.protobuf import json_format\n", + "from google.protobuf.struct_pb2 import Value" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Set environment variables.\n", + "\n", + "Set environment variables so that we can use them throughout the entire lab. We will be using our project name for our bucket, so you only need to change your project and region." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%bash\n", + "PROJECT=$(gcloud config list project --format \"value(core.project)\")\n", + "echo \"Your current GCP Project Name is: \"$PROJECT" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "PROJECT = !gcloud config list --format 'value(core.project)'\n", + "PROJECT = PROJECT[0]\n", + "BUCKET = PROJECT # defaults to PROJECT\n", + "REGION = \"us-central1\" # Replace with your REGION" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "os.environ[\"PROJECT\"] = PROJECT\n", + "os.environ[\"BUCKET\"] = BUCKET\n", + "os.environ[\"REGION\"] = REGION" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%bash\n", + "gcloud config set project $PROJECT\n", + "gcloud config set ai/region $REGION" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Check our trained model files\n", + "\n", + "Let's check the directory structure of our outputs of our trained model in folder we exported the model to in our last [lab](../solutions/10_train_keras_ai_platform_babyweight.ipynb). We'll want to deploy the saved_model.pb within the directory of the tuned model as well as the variable values in the variables folder. Therefore, we need the path of the latest tuned directory so that everything within it can be found by Vertex AI's model deployment service. Note that the `2*` substrings are there to match timestamp strings." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%bash\n", + "gsutil ls gs://${BUCKET}/babyweight/tuned_2*" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%bash\n", + "MODEL_LOCATION=$(gsutil ls -d -- gs://${BUCKET}/babyweight/tuned_2*/2* \\\n", + " | tail -1)\n", + "gsutil ls ${MODEL_LOCATION}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Upload model, create endpoint and deploy trained model\n", + "\n", + "Uploading our SavedModel from the above `MODEL_LOCATION`, creating and endpoint and deploying the trained model to act as a REST web service are three simple gcloud calls. We also run a command to list the endpoints, to fetch the fully qualified resource name `ENDPOINT_RESOURCENAME` for the endpoint." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%bash\n", + "TIMESTAMP=$(date -u +%Y%m%d_%H%M%S)\n", + "MODEL_DISPLAYNAME=babyweight_model_$TIMESTAMP\n", + "ENDPOINT_DISPLAYNAME=babyweight_endpoint_$TIMESTAMP\n", + "IMAGE_URI=\"us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-3:latest\"\n", + "MODEL_LOCATION=$(gsutil ls -d -- gs://${BUCKET}/babyweight/tuned_2*/2* \\\n", + " | tail -1)\n", + "echo \"MODEL_LOCATION=${MODEL_LOCATION}\"\n", + "\n", + "# Model\n", + "MODEL_RESOURCENAME=$(gcloud ai models upload \\\n", + " --region=$REGION \\\n", + " --display-name=$MODEL_DISPLAYNAME \\\n", + " --container-image-uri=$IMAGE_URI \\\n", + " --artifact-uri=$MODEL_LOCATION \\\n", + " --format=\"value(model)\")\n", + "\n", + "MODEL_ID=$(echo $MODEL_RESOURCENAME | cut -d\"/\" -f6)\n", + "\n", + "echo \"MODEL_DISPLAYNAME=${MODEL_DISPLAYNAME}\"\n", + "echo \"MODEL_RESOURCENAME=${MODEL_RESOURCENAME}\"\n", + "echo \"MODEL_ID=${MODEL_ID}\"\n", + "\n", + "# Endpoint\n", + "ENDPOINT_RESOURCENAME=$(gcloud ai endpoints create \\\n", + " --region=$REGION \\\n", + " --display-name=$ENDPOINT_DISPLAYNAME \\\n", + " --format=\"value(name)\")\n", + "\n", + "ENDPOINT_ID=$(echo $ENDPOINT_RESOURCENAME | cut -d\"/\" -f6)\n", + "\n", + "echo \"ENDPOINT_DISPLAYNAME=${ENDPOINT_DISPLAYNAME}\"\n", + "echo \"ENDPOINT_RESOURCENAME=${ENDPOINT_RESOURCENAME}\"\n", + "echo \"ENDPOINT_ID=${ENDPOINT_ID}\"\n", + "\n", + "# Deployment\n", + "DEPLOYEDMODEL_DISPLAYNAME=${MODEL_DISPLAYNAME}_deployment\n", + "MACHINE_TYPE=n1-standard-2\n", + "MIN_REPLICA_COUNT=1\n", + "MAX_REPLICA_COUNT=3\n", + "\n", + "gcloud ai endpoints deploy-model $ENDPOINT_RESOURCENAME \\\n", + " --region=$REGION \\\n", + " --model=$MODEL_RESOURCENAME \\\n", + " --display-name=$DEPLOYEDMODEL_DISPLAYNAME \\\n", + " --machine-type=$MACHINE_TYPE \\\n", + " --min-replica-count=$MIN_REPLICA_COUNT \\\n", + " --max-replica-count=$MAX_REPLICA_COUNT \\\n", + " --traffic-split=0=100" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Use model to make online prediction." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Python API\n", + "\n", + "We can use the Python API to send a JSON request to the endpoint of the service to make it predict a baby's weight. The order of the responses are the order of the instances." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# TODO: Copy your `ENDPOINT_RESOURCENAME` from above.\n", + "ENDPOINT_RESOURCENAME = \"\"\n", + "os.environ[\"ENDPOINT_RESOURCENAME\"] = ENDPOINT_RESOURCENAME\n", + "\n", + "api_endpoint = f\"{REGION}-aiplatform.googleapis.com\"\n", + "\n", + "# The AI Platform services require regional API endpoints.\n", + "client_options = {\"api_endpoint\": api_endpoint}\n", + "# Initialize client that will be used to create and send requests.\n", + "# This client only needs to be created once, and can be reused for multiple requests.\n", + "client = aiplatform.gapic.PredictionServiceClient(client_options=client_options)\n", + "\n", + "instances = [\n", + " {\n", + " \"is_male\": [\"True\"],\n", + " \"mother_age\": [26.0],\n", + " \"plurality\": [\"Single(1)\"],\n", + " \"gestation_weeks\": [39],\n", + " },\n", + " {\n", + " \"is_male\": [\"False\"],\n", + " \"mother_age\": [29.0],\n", + " \"plurality\": [\"Single(1)\"],\n", + " \"gestation_weeks\": [38],\n", + " },\n", + " {\n", + " \"is_male\": [\"True\"],\n", + " \"mother_age\": [26.0],\n", + " \"plurality\": [\"Triplets(3)\"],\n", + " \"gestation_weeks\": [39],\n", + " },\n", + " {\n", + " \"is_male\": [\"Unknown\"],\n", + " \"mother_age\": [29.0],\n", + " \"plurality\": [\"Multiple(2+)\"],\n", + " \"gestation_weeks\": [38],\n", + " },\n", + "]\n", + "\n", + "instances = [json_format.ParseDict(instance, Value()) for instance in instances]\n", + "response = client.predict(endpoint=ENDPOINT_RESOURCENAME, instances=instances)\n", + "\n", + "# The predictions are a google.protobuf.Value representation of the model's predictions.\n", + "print(\" prediction:\", response.predictions)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The predictions for the four instances were: 5.33, 6.09, 2.50, and 5.86 pounds respectively when I ran it (your results might be different)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### gcloud shell API\n", + "\n", + "Instead we could use the gcloud shell API. Create a newline delimited JSON file with one instance per line and submit using gcloud." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile inputs.json\n", + "{\n", + " \"instances\": [\n", + " {\"is_male\": [\"True\"], \"mother_age\": [26.0], \"plurality\": [\"Single(1)\"], \"gestation_weeks\": [39]},\n", + " {\"is_male\": [\"False\"], \"mother_age\": [26.0], \"plurality\": [\"Single(1)\"], \"gestation_weeks\": [39]}\n", + " ]\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now call `gcloud ai endpoint predict` using the JSON we just created and point to our deployed `ENDPOINT_RESOURCENAME`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%bash\n", + "gcloud ai endpoints predict $ENDPOINT_RESOURCENAME \\\n", + " --region=$REGION \\\n", + " --json-request=inputs.json" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Use model to make batch prediction.\n", + "\n", + "Batch prediction is commonly used when you have thousands to millions of predictions. It will create a Vertex AI batch prediction job. We will put our prediction request JSONL file (multiple lines of JSON records) to GCS, and use the Python API to request the job." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile inputs.jsonl\n", + "{\"is_male\": [\"True\"], \"mother_age\": [26.0], \"plurality\": [\"Single(1)\"], \"gestation_weeks\": [39]}\n", + "{\"is_male\": [\"False\"], \"mother_age\": [26.0], \"plurality\": [\"Single(1)\"], \"gestation_weeks\": [39]}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!gsutil cp inputs.jsonl gs://$BUCKET/babyweight/batchpred/inputs.jsonl" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# TODO: replace with your MODEL_RESOURCENAME from above\n", + "MODEL_RESOURCENAME = \"\"\n", + "\n", + "aiplatform.init(project=PROJECT, location=REGION)\n", + "\n", + "my_model = aiplatform.Model(MODEL_RESOURCENAME)\n", + "\n", + "batch_prediction_job = my_model.batch_predict(\n", + " job_display_name=\"babyweight_batch\",\n", + " gcs_source=f\"gs://{BUCKET}/babyweight/batchpred/inputs.jsonl\",\n", + " gcs_destination_prefix=f\"gs://{BUCKET}/babyweight/batchpred/outputs\",\n", + " machine_type=\"n1-standard-2\",\n", + " accelerator_count=0,\n", + " starting_replica_count=1,\n", + " max_replica_count=1,\n", + ")\n", + "\n", + "batch_prediction_job.wait()\n", + "\n", + "print(batch_prediction_job.display_name)\n", + "print(batch_prediction_job.resource_name)\n", + "print(batch_prediction_job.state)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!gsutil cat $(gsutil ls gs://$BUCKET/babyweight/batchpred/outputs | tail -n1)prediction.errors_stats-*" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!gsutil cat $(gsutil ls gs://$BUCKET/babyweight/batchpred/outputs | tail -n1)prediction.results-*" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Lab Summary:\n", + "In this lab, we set up the environment, deployed a trained Keras model to Vertex AI, online predicted from deployed model, and batch predicted from deployed model on Vertex AI." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright 2021 Google LLC\n", + "Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "you may not use this file except in compliance with the License.\n", + "You may obtain a copy of the License at\n", + " https://www.apache.org/licenses/LICENSE-2.0\n", + "Unless required by applicable law or agreed to in writing, software\n", + "distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "See the License for the specific language governing permissions and\n", + "limitations under the License." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "environment": { + "kernel": "python3", + "name": "tf2-gpu.2-8.m91", + "type": "gcloud", + "uri": "gcr.io/deeplearning-platform-release/tf2-gpu.2-8:m91" + }, + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.12" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} From a8347d7459f37f334cfc58d8bbf092c259f810e6 Mon Sep 17 00:00:00 2001 From: Takumi Ohyama Date: Mon, 4 Jul 2022 15:07:13 +0000 Subject: [PATCH 5/8] renamed end-to-end vertex AI labs --- ...essing_layers.ipynb => 5a_train_keras_vertex_babyweight.ipynb} | 0 ...ssing_layers.ipynb => 5b_deploy_keras_vertex_babyweight.ipynb} | 0 ...essing_layers.ipynb => 5a_train_keras_vertex_babyweight.ipynb} | 0 ...ssing_layers.ipynb => 5b_deploy_keras_vertex_babyweight.ipynb} | 0 4 files changed, 0 insertions(+), 0 deletions(-) rename notebooks/end-to-end-structured/labs/{5a_train_keras_vertex_babyweight_preprocessing_layers.ipynb => 5a_train_keras_vertex_babyweight.ipynb} (100%) rename notebooks/end-to-end-structured/labs/{5b_deploy_keras_vertex_babyweight_preprocessing_layers.ipynb => 5b_deploy_keras_vertex_babyweight.ipynb} (100%) rename notebooks/end-to-end-structured/solutions/{5a_train_keras_vertex_babyweight_preprocessing_layers.ipynb => 5a_train_keras_vertex_babyweight.ipynb} (100%) rename notebooks/end-to-end-structured/solutions/{5b_deploy_keras_vertex_babyweight_preprocessing_layers.ipynb => 5b_deploy_keras_vertex_babyweight.ipynb} (100%) diff --git a/notebooks/end-to-end-structured/labs/5a_train_keras_vertex_babyweight_preprocessing_layers.ipynb b/notebooks/end-to-end-structured/labs/5a_train_keras_vertex_babyweight.ipynb similarity index 100% rename from notebooks/end-to-end-structured/labs/5a_train_keras_vertex_babyweight_preprocessing_layers.ipynb rename to notebooks/end-to-end-structured/labs/5a_train_keras_vertex_babyweight.ipynb diff --git a/notebooks/end-to-end-structured/labs/5b_deploy_keras_vertex_babyweight_preprocessing_layers.ipynb b/notebooks/end-to-end-structured/labs/5b_deploy_keras_vertex_babyweight.ipynb similarity index 100% rename from notebooks/end-to-end-structured/labs/5b_deploy_keras_vertex_babyweight_preprocessing_layers.ipynb rename to notebooks/end-to-end-structured/labs/5b_deploy_keras_vertex_babyweight.ipynb diff --git a/notebooks/end-to-end-structured/solutions/5a_train_keras_vertex_babyweight_preprocessing_layers.ipynb b/notebooks/end-to-end-structured/solutions/5a_train_keras_vertex_babyweight.ipynb similarity index 100% rename from notebooks/end-to-end-structured/solutions/5a_train_keras_vertex_babyweight_preprocessing_layers.ipynb rename to notebooks/end-to-end-structured/solutions/5a_train_keras_vertex_babyweight.ipynb diff --git a/notebooks/end-to-end-structured/solutions/5b_deploy_keras_vertex_babyweight_preprocessing_layers.ipynb b/notebooks/end-to-end-structured/solutions/5b_deploy_keras_vertex_babyweight.ipynb similarity index 100% rename from notebooks/end-to-end-structured/solutions/5b_deploy_keras_vertex_babyweight_preprocessing_layers.ipynb rename to notebooks/end-to-end-structured/solutions/5b_deploy_keras_vertex_babyweight.ipynb From 9fe73da1ab381a32ec74bbedb7500968bbeb42be Mon Sep 17 00:00:00 2001 From: Takumi Ohyama Date: Thu, 7 Jul 2022 12:09:51 +0000 Subject: [PATCH 6/8] added cloud-hypertune in requirements.txt --- requirements.txt | 3 +++ 1 file changed, 3 insertions(+) diff --git a/requirements.txt b/requirements.txt index 434b06e3..21ee6912 100644 --- a/requirements.txt +++ b/requirements.txt @@ -4,3 +4,6 @@ google-cloud-pipeline-components==0.2.1 kfp==1.8.10 tfx==1.4.0 pre-commit + +# Requirement for local tests of hyperparameter tuning jobs +cloudml-hypertune From 06ad44dbaac7b386705b1d55a7bdfc87c02cfe80 Mon Sep 17 00:00:00 2001 From: Takumi Ohyama Date: Thu, 7 Jul 2022 12:25:02 +0000 Subject: [PATCH 7/8] reflected reviews on end-to-end 5a --- .../5a_train_keras_vertex_babyweight.ipynb | 39 +++++++------------ .../5a_train_keras_vertex_babyweight.ipynb | 37 ++++++------------ 2 files changed, 27 insertions(+), 49 deletions(-) diff --git a/notebooks/end-to-end-structured/labs/5a_train_keras_vertex_babyweight.ipynb b/notebooks/end-to-end-structured/labs/5a_train_keras_vertex_babyweight.ipynb index c878d0f3..e6184ab4 100644 --- a/notebooks/end-to-end-structured/labs/5a_train_keras_vertex_babyweight.ipynb +++ b/notebooks/end-to-end-structured/labs/5a_train_keras_vertex_babyweight.ipynb @@ -38,7 +38,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "First we will install the `cloudml-hypertune` package on our local machine. This is the package which we will use to report hyperparameter tuning metrics to Vertex AI. Installing the package will allow us to test our trainer package locally." + "First we will import the `cloudml-hypertune` package. This is the package which we will use to report hyperparameter tuning metrics to Vertex AI. Importing the package will allow us to test our trainer package locally." ] }, { @@ -47,20 +47,7 @@ "metadata": {}, "outputs": [], "source": [ - "try:\n", - " import hypertune\n", - "\n", - "except ImportError:\n", - " !pip3 install -U cloudml-hypertune --user\n", - "\n", - " print(\"Please restart the kernel and re-run the notebook.\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "If the above command resulted in an installation, please restart the notebook kernel and re-run the notebook." + "import hypertune" ] }, { @@ -205,7 +192,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Create trainer module's task.py to hold hyperparameter argparsing code.\n", + "### Lab Task #1: Create trainer module's task.py to hold hyperparameter argparsing code.\n", "\n", "The cell below writes the file `babyweight/trainer/task.py` which sets up our training job. Here is where we determine which parameters of our model to pass as flags during training using the `parser` module. Look at how `batch_size` is passed to the model in the code below. Use this as an example to parse arguements for the following variables\n", "- `nnsize` which represents the hidden layer sizes to use for DNN feature columns\n", @@ -282,9 +269,9 @@ "source": [ "In the same way we can write to the file `model.py` the model that we developed in the previous notebooks. \n", "\n", - "### Create trainer module's model.py to hold Keras model code.\n", + "### Lab Task #2: Create trainer module's model.py to hold Keras model code.\n", "\n", - "To create our `model.py`, we'll use the code we wrote for the Wide & Deep model. Look back at your [9_keras_wide_and_deep_babyweight](../solutions/9_keras_wide_and_deep_babyweight.ipynb) notebook and copy/paste the necessary code from that notebook into its place in the cell below." + "Complete the TODOs in the code cell below to create our `model.py`. We'll use the code we wrote for the Wide & Deep model. Look back at your [4c_keras_wide_and_deep_babyweight](../solutions/4c_keras_wide_and_deep_babyweight.ipynb) notebook and copy/paste the necessary code from that notebook into its place in the cell below." ] }, { @@ -420,9 +407,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Run trainer module package locally.\n", + "### Lab Task #3: Run trainer module package locally.\n", "\n", - "We can run a very small training job over a single file with a small batch size, 1 epoch, 1 train example, and 1 eval step." + "Fill in the missing code in the TODOs below so that we can run a very small training job over a single file with a small batch size, 1 epoch, 1 train example, and 1 eval step." ] }, { @@ -449,7 +436,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Training on Vertex AI\n", + "## Lab Task #4: Training on Vertex AI\n", "\n", "Now that we see everything is working locally, it's time to train on the cloud! First, we need to package our code as a source distribution. For this, we can use `setuptools`. " ] @@ -512,7 +499,9 @@ "\n", "You might have earlier seen `gcloud ai custom-jobs create` executed with the `worker pool spec` and pass-through Python arguments specified directly in the command call, here we will use a YAML file, this will make it easier to transition to hyperparameter tuning.\n", "\n", - "Through the `args:` argument we add in the passed-through arguments for our `task.py` file." + "Through the `args:` argument we add in the passed-through arguments for our `task.py` file.\n", + "\n", + "Complete the #TODOs to make sure you have the necessary user_args for our task.py's parser." ] }, { @@ -566,10 +555,12 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Hyperparameter tuning\n", + "## Lab Task #5: Hyperparameter tuning\n", "\n", "To do hyperparameter tuning, create a YAML file and and pass its name with `--config`.\n", - "This step could take hours -- you can increase `--parallel-trial-count` or reduce `--max-trial-count` to get it done faster. Since `--parallel-trial-count` is the number of initial seeds to start searching from, you don't want it to be too large; otherwise, all you have is a random search." + "This step could take hours -- you can increase `--parallel-trial-count` or reduce `--max-trial-count` to get it done faster. Since `--parallel-trial-count` is the number of initial seeds to start searching from, you don't want it to be too large; otherwise, all you have is a random search.\n", + "\n", + "Complete #TODOs in the yaml file and gcloud training job bash command so that we can run hyperparameter tuning." ] }, { diff --git a/notebooks/end-to-end-structured/solutions/5a_train_keras_vertex_babyweight.ipynb b/notebooks/end-to-end-structured/solutions/5a_train_keras_vertex_babyweight.ipynb index 99cd8bed..cbd87dbe 100644 --- a/notebooks/end-to-end-structured/solutions/5a_train_keras_vertex_babyweight.ipynb +++ b/notebooks/end-to-end-structured/solutions/5a_train_keras_vertex_babyweight.ipynb @@ -38,7 +38,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "First we will install the `cloudml-hypertune` package on our local machine. This is the package which we will use to report hyperparameter tuning metrics to Vertex AI. Installing the package will allow us to test our trainer package locally." + "First we will import the `cloudml-hypertune` package. This is the package which we will use to report hyperparameter tuning metrics to Vertex AI. Importing the package will allow us to test our trainer package locally." ] }, { @@ -47,20 +47,7 @@ "metadata": {}, "outputs": [], "source": [ - "try:\n", - " import hypertune\n", - "\n", - "except ImportError:\n", - " !pip3 install -U cloudml-hypertune --user\n", - "\n", - " print(\"Please restart the kernel and re-run the notebook.\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "If the above command resulted in an installation, please restart the notebook kernel and re-run the notebook." + "import hypertune" ] }, { @@ -426,12 +413,12 @@ " }\n", " bucketized = {}\n", "\n", - " for nc in NUMERICAL_COLUMNS:\n", - " deep[nc] = inputs[nc]\n", - " bucketized[nc] = tf.keras.layers.Discretization(buckets[nc])(inputs[nc])\n", - " wide[f\"btk_{nc}\"] = tf.keras.layers.CategoryEncoding(\n", - " num_tokens=len(buckets[nc]) + 1, output_mode=\"one_hot\"\n", - " )(bucketized[nc])\n", + " for numerical_column in NUMERICAL_COLUMNS:\n", + " deep[numerical_column] = inputs[numerical_column]\n", + " bucketized[numerical_column] = tf.keras.layers.Discretization(buckets[numerical_column])(inputs[numerical_column])\n", + " wide[f\"btk_{numerical_column}\"] = tf.keras.layers.CategoryEncoding(\n", + " num_tokens=len(buckets[numerical_column]) + 1, output_mode=\"one_hot\"\n", + " )(bucketized[numerical_column])\n", "\n", " crossed = tf.keras.layers.experimental.preprocessing.HashedCrossing(\n", " num_bins=len(buckets[\"mother_age\"]) * len(buckets[\"gestation_weeks\"])\n", @@ -457,10 +444,10 @@ " ],\n", " }\n", "\n", - " for cc in CATEGORICAL_COLUMNS:\n", - " wide[cc] = tf.keras.layers.StringLookup(\n", - " vocabulary=vocab[cc], output_mode=\"one_hot\"\n", - " )(inputs[cc])\n", + " for categorical_column in CATEGORICAL_COLUMNS:\n", + " wide[categorical_column] = tf.keras.layers.StringLookup(\n", + " vocabulary=vocab[categorical_column], output_mode=\"one_hot\"\n", + " )(inputs[categorical_column])\n", "\n", " return wide, deep\n", "\n", From c9d11beab6a9be619922898735248abb4a5a78f2 Mon Sep 17 00:00:00 2001 From: Takumi Ohyama Date: Thu, 7 Jul 2022 12:37:13 +0000 Subject: [PATCH 8/8] eflected reviews on end-to-end 5b --- .../5b_deploy_keras_vertex_babyweight.ipynb | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/notebooks/end-to-end-structured/labs/5b_deploy_keras_vertex_babyweight.ipynb b/notebooks/end-to-end-structured/labs/5b_deploy_keras_vertex_babyweight.ipynb index db1d5896..6c3800bf 100644 --- a/notebooks/end-to-end-structured/labs/5b_deploy_keras_vertex_babyweight.ipynb +++ b/notebooks/end-to-end-structured/labs/5b_deploy_keras_vertex_babyweight.ipynb @@ -140,9 +140,11 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Upload model, create endpoint and deploy trained model\n", + "## Lab Task #1: Upload model, create endpoint and deploy trained model\n", "\n", - "Uploading our SavedModel from the above `MODEL_LOCATION`, creating and endpoint and deploying the trained model to act as a REST web service are three simple gcloud calls. We also run a command to list the endpoints, to fetch the fully qualified resource name `ENDPOINT_RESOURCENAME` for the endpoint." + "Uploading our SavedModel from the above `MODEL_LOCATION`, creating and endpoint and deploying the trained model to act as a REST web service are three simple gcloud calls. We also run a command to list the endpoints, to fetch the fully qualified resource name `ENDPOINT_RESOURCENAME` for the endpoint.\n", + "\n", + "Complete __#TODO__ by providing location of saved_model.pb file to Vertex AI. The deployment will take a few minutes." ] }, { @@ -156,8 +158,7 @@ "MODEL_DISPLAYNAME=babyweight_model_$TIMESTAMP\n", "ENDPOINT_DISPLAYNAME=babyweight_endpoint_$TIMESTAMP\n", "IMAGE_URI=\"us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-3:latest\"\n", - "MODEL_LOCATION=$(gsutil ls -d -- gs://${BUCKET}/babyweight/tuned_2*/2* \\\n", - " | tail -1)\n", + "MODEL_LOCATION=# TODO: Add GCS path to saved_model.pb file.\n", "echo \"MODEL_LOCATION=${MODEL_LOCATION}\"\n", "\n", "# Model\n", @@ -206,7 +207,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Use model to make online prediction." + "## Lab Task #2: Use model to make online prediction.\n", + "Complete **#TODO**s for both the Python API method of calling our deployed model on Vertex AI for online prediction." ] }, { @@ -321,9 +323,11 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Use model to make batch prediction.\n", + "## Lab Task #3: Use model to make batch prediction.\n", + "\n", + "Batch prediction is commonly used when you have thousands to millions of predictions. It will create a Vertex AI batch prediction job. We will put our prediction request JSONL file (multiple lines of JSON records) to GCS, and use the Python API to request the job.\n", "\n", - "Batch prediction is commonly used when you have thousands to millions of predictions. It will create a Vertex AI batch prediction job. We will put our prediction request JSONL file (multiple lines of JSON records) to GCS, and use the Python API to request the job." + "Complete **#TODO**s so we can call our deployed model on Vertex AI for batch prediction." ] }, {