Skip to content

AIOps modules is a collection of reusable Infrastructure as Code (IaC) modules for Machine Learning (ML), Foundation Models (FM), Large Language Models (LLM) and GenAI development and operations on AWS

License

Notifications You must be signed in to change notification settings

awslabs/aiops-modules

AIOps Modules

AIOps modules is a collection of reusable Infrastructure as Code (IAC) modules that works with SeedFarmer CLI. Please see the DOCS for all things seed-farmer.

The modules in this repository are decoupled from each other and can be aggregated together using GitOps (manifest file) principles provided by seedfarmer and achieve the desired use cases. It removes the undifferentiated heavy lifting for an end user by providing hardended modules and enables them to focus on building business on top of them.

General Information

The modules in this repository are / must be generic for reuse without affiliation to any one particular project in Machine Learning and Foundation Model Operations domain.

All modules in this repository adhere to the module structure defined in the SeedFarmer Guide

Deployment

See deployment steps in the Deployment Guide.

Project Manifests

End-to-end example use-cases built using modules in this repository.

Type Description
MLOps with Amazon SageMaker Set up environment for MLOps with Amazon SageMaker. Deploy secure Amazon SageMaker Studio Domain, and provisions SageMaker Project Templates using Service Catalog, including model training and deployment.
Ray on Amazon Elastic Kubernetes Service (EKS) Run Ray on AWS EKS. Deploys an AWS EKS cluster, KubeRay Ray Operator, and a Ray Cluster with autoscaling enabled.
Fine-tune 6B LLM (GPT-J) using Ray on Amazon EKS Run fine-tuning of 6B GPT-J LLM. Deploys an AWS EKS cluster, KubeRay Ray Operator, and a Ray Cluster with autoscaling enabled, and runs a fine-tuning job. How to fine tune a 6B LLM simply and cost-effective using Ray on Amazon EKS?
Mlflow tracking server and model registry with Amazon SageMaker An example using Mlflow experiments tracking, model registry, and LLM tracing with Amazon SageMaker. Deploy self-hosted Mlflow tracking server and model registry on AWS Fargate, and Amazon SageMaker Studio Domain environment.
Managed Workflows with Apache Airflow (MWAA) for Machine Learning Training An example orchestrating ML training jobs with Managed Workflows for Apache Airflow (MWAA). Deploys MWAA and an example ML training DAG.
MLOps with Step Functions Automate machine learning lifecycle using Amazon SageMaker and AWS Step Functions.
Bedrock Fine-Tuning with Step Functions Continuously Fine-tune a Foundation Model with Bedrock Fine-Tuning jobs and AWS Step Functions.
AppSync Knowledge Base Ingestion and Question and Answering RAG Creates an Graphql endpoint for ingestion of data and and use ingested as knowledge base for a Question and Answering model using RAG.

Modules

SageMaker Modules

Type Description
SageMaker Studio Module Provisions secure SageMaker Studio Domain environment, creates example User Profiles for Data Scientist and Lead Data Scientist linked to IAM Roles, and adds lifecycle config
SageMaker Endpoint Module Creates SageMaker real-time inference endpoint for the specified model package or latest approved model from the model package group
SageMaker Project Templates via Service Catalog Module Provisions SageMaker Project Templates for an organization. The templates are available using SageMaker Studio Classic or Service Catalog. Available templates:
- Train a model on Abalone dataset using XGBoost
- Perform batch inference
- Multi-account model deployment
- HuggingFace model import template
- LLM fine-tuning and evaluation
SageMaker Notebook Instance Module Creates secure SageMaker Notebook Instance for the Data Scientist, clones the source code to the workspace
SageMaker Custom Kernel Module Builds custom kernel for SageMaker Studio from a Dockerfile
SageMaker Model Package Group Module Creates a SageMaker Model Package Group to register and version SageMaker Machine Learning (ML) models and setups an Amazon EventBridge Rule to send model package group state change events to an Amazon EventBridge Bus
SageMaker Model Package Promote Pipeline Module Deploy a Pipeline to promote SageMaker Model Packages in a multi-account setup. The pipeline can be triggered through an EventBridge rule in reaction of a SageMaker Model Package Group state event change (Approved/Rejected). Once the pipeline is triggered, it will promote the latest approved model package, if one is found.
SageMaker Model Monitoring Module Deploy data quality, model quality, model bias, and model explainability monitoring jobs which run against a SageMaker Endpoint.
SageMaker Model CICD Module Creates a comprehensive CICD pipeline using AWS CodePipelines to build and deploy a ML model on SageMaker.
SageMaker Ground Truth Labeling Module Creates a state machine to allow labeling of images and text file, uploaded to the upload bucket, using various built-in task types in SageMaker Ground Truth.

Mlflow Modules

Type Description
Mlflow Image Module Creates Mlflow Tracing Server Docker image and pushes the image to Elastic Container Registry
Mlflow on AWS Fargate Module Runs Mlflow container on AWS Fargate in a load-balanced Elastic Container Service. Supports Elastic File System and Relational Database Store for metadata persistence, and S3 for artifact store
Mlflow AI Gateway Image Module Creates Mlflow AI Gateway Docker image and pushes the image to Elastic Container Registry

FMOps/LLMOps Modules

Type Description
SageMaker JumpStart Foundation Model Endpoint Module Creates an endpoint for a SageMaker JumpStart Foundation Model.
SageMaker Hugging Face Foundation Model Endpoint Module Creates an endpoint for a SageMaker Hugging Face Foundation Model.
Amazon Bedrock Finetuning Module Creates a pipeline that automatically triggers Amazon Bedrock Finetuning.
AppSync Knowledge Base Ingestion and Question and Answering RAG Module Creates an Graphql endpoint for ingestion of data and and use ingested as knowledge base for a Question and Answering model using RAG.

MWAA Modules

Type Description
Example DAG for MLOps Module Deploys a Sample DAG in MWAA demonstrating MLOPs and it is using MWAA module from IDF

MLOps using Step Functions Module

Type Description
Example for MLOps using Step Functions Deploys a AWS State Machine in AWS Step Functions demonstrating how to implement the MLOPs using AWS Step Functions

EKS Modules

Type Description
Ray Operator Module Provisions a Ray Operator on EKS.
Ray Cluster Module Provisions a Ray Cluster on EKS. Requires a Ray Operator.
Ray Orchestrator Module Creates a Step Function to orcehstrate submission of a sample Ray job that fine-tunes GPT-J 6B parameters Large Language Model on tiny shakespeare dataset and performs inference.
Ray Image Module An example that builds a custom Ray image and pushes to ECR.

Example Modules

Type Description
Event Bus Module Creates an Amazon EventBridge Bus for cross-account events.
Personas Module This module is an example that creates various roles required for an AI/ML project.

Industry Data Framework (IDF) Modules

The modules in this repository are compatible with Industry Data Framework (IDF) Modules and can be used together within the same deployment. Refer to examples/manifests for examples.

Autonomous Driving Data Framework (ADDF) Modules

The modules in this repository are compatible with Autonomous Driving Data Framework (ADDF) Modules and can be used together within the same deployment.