AIOps modules is a collection of reusable Infrastructure as Code (IAC) modules that works with SeedFarmer CLI. Please see the DOCS for all things seed-farmer.
The modules in this repository are decoupled from each other and can be aggregated together using GitOps (manifest file) principles provided by seedfarmer
and achieve the desired use cases. It removes the undifferentiated heavy lifting for an end user by providing hardended modules and enables them to focus on building business on top of them.
The modules in this repository are / must be generic for reuse without affiliation to any one particular project in Machine Learning and Foundation Model Operations domain.
All modules in this repository adhere to the module structure defined in the SeedFarmer Guide
See deployment steps in the Deployment Guide.
End-to-end example use-cases built using modules in this repository.
Type | Description |
---|---|
MLOps with Amazon SageMaker | Set up environment for MLOps with Amazon SageMaker. Deploy secure Amazon SageMaker Studio Domain, and provisions SageMaker Project Templates using Service Catalog, including model training and deployment. |
Ray on Amazon Elastic Kubernetes Service (EKS) | Run Ray on AWS EKS. Deploys an AWS EKS cluster, KubeRay Ray Operator, and a Ray Cluster with autoscaling enabled. |
Fine-tune 6B LLM (GPT-J) using Ray on Amazon EKS | Run fine-tuning of 6B GPT-J LLM. Deploys an AWS EKS cluster, KubeRay Ray Operator, and a Ray Cluster with autoscaling enabled, and runs a fine-tuning job. How to fine tune a 6B LLM simply and cost-effective using Ray on Amazon EKS? |
Mlflow tracking server and model registry with Amazon SageMaker | An example using Mlflow experiments tracking, model registry, and LLM tracing with Amazon SageMaker. Deploy self-hosted Mlflow tracking server and model registry on AWS Fargate, and Amazon SageMaker Studio Domain environment. |
Managed Workflows with Apache Airflow (MWAA) for Machine Learning Training | An example orchestrating ML training jobs with Managed Workflows for Apache Airflow (MWAA). Deploys MWAA and an example ML training DAG. |
MLOps with Step Functions | Automate machine learning lifecycle using Amazon SageMaker and AWS Step Functions. |
Bedrock Fine-Tuning with Step Functions | Continuously Fine-tune a Foundation Model with Bedrock Fine-Tuning jobs and AWS Step Functions. |
AppSync Knowledge Base Ingestion and Question and Answering RAG | Creates an Graphql endpoint for ingestion of data and and use ingested as knowledge base for a Question and Answering model using RAG. |
Type | Description |
---|---|
SageMaker Studio Module | Provisions secure SageMaker Studio Domain environment, creates example User Profiles for Data Scientist and Lead Data Scientist linked to IAM Roles, and adds lifecycle config |
SageMaker Endpoint Module | Creates SageMaker real-time inference endpoint for the specified model package or latest approved model from the model package group |
SageMaker Project Templates via Service Catalog Module | Provisions SageMaker Project Templates for an organization. The templates are available using SageMaker Studio Classic or Service Catalog. Available templates: - Train a model on Abalone dataset using XGBoost - Perform batch inference - Multi-account model deployment - HuggingFace model import template - LLM fine-tuning and evaluation |
SageMaker Notebook Instance Module | Creates secure SageMaker Notebook Instance for the Data Scientist, clones the source code to the workspace |
SageMaker Custom Kernel Module | Builds custom kernel for SageMaker Studio from a Dockerfile |
SageMaker Model Package Group Module | Creates a SageMaker Model Package Group to register and version SageMaker Machine Learning (ML) models and setups an Amazon EventBridge Rule to send model package group state change events to an Amazon EventBridge Bus |
SageMaker Model Package Promote Pipeline Module | Deploy a Pipeline to promote SageMaker Model Packages in a multi-account setup. The pipeline can be triggered through an EventBridge rule in reaction of a SageMaker Model Package Group state event change (Approved/Rejected). Once the pipeline is triggered, it will promote the latest approved model package, if one is found. |
SageMaker Model Monitoring Module | Deploy data quality, model quality, model bias, and model explainability monitoring jobs which run against a SageMaker Endpoint. |
SageMaker Model CICD Module | Creates a comprehensive CICD pipeline using AWS CodePipelines to build and deploy a ML model on SageMaker. |
SageMaker Ground Truth Labeling Module | Creates a state machine to allow labeling of images and text file, uploaded to the upload bucket, using various built-in task types in SageMaker Ground Truth. |
Type | Description |
---|---|
Mlflow Image Module | Creates Mlflow Tracing Server Docker image and pushes the image to Elastic Container Registry |
Mlflow on AWS Fargate Module | Runs Mlflow container on AWS Fargate in a load-balanced Elastic Container Service. Supports Elastic File System and Relational Database Store for metadata persistence, and S3 for artifact store |
Mlflow AI Gateway Image Module | Creates Mlflow AI Gateway Docker image and pushes the image to Elastic Container Registry |
Type | Description |
---|---|
SageMaker JumpStart Foundation Model Endpoint Module | Creates an endpoint for a SageMaker JumpStart Foundation Model. |
SageMaker Hugging Face Foundation Model Endpoint Module | Creates an endpoint for a SageMaker Hugging Face Foundation Model. |
Amazon Bedrock Finetuning Module | Creates a pipeline that automatically triggers Amazon Bedrock Finetuning. |
AppSync Knowledge Base Ingestion and Question and Answering RAG Module | Creates an Graphql endpoint for ingestion of data and and use ingested as knowledge base for a Question and Answering model using RAG. |
Type | Description |
---|---|
Example DAG for MLOps Module | Deploys a Sample DAG in MWAA demonstrating MLOPs and it is using MWAA module from IDF |
Type | Description |
---|---|
Example for MLOps using Step Functions | Deploys a AWS State Machine in AWS Step Functions demonstrating how to implement the MLOPs using AWS Step Functions |
Type | Description |
---|---|
Ray Operator Module | Provisions a Ray Operator on EKS. |
Ray Cluster Module | Provisions a Ray Cluster on EKS. Requires a Ray Operator. |
Ray Orchestrator Module | Creates a Step Function to orcehstrate submission of a sample Ray job that fine-tunes GPT-J 6B parameters Large Language Model on tiny shakespeare dataset and performs inference. |
Ray Image Module | An example that builds a custom Ray image and pushes to ECR. |
Type | Description |
---|---|
Event Bus Module | Creates an Amazon EventBridge Bus for cross-account events. |
Personas Module | This module is an example that creates various roles required for an AI/ML project. |
The modules in this repository are compatible with Industry Data Framework (IDF) Modules and can be used together within the same deployment. Refer to examples/manifests
for examples.
The modules in this repository are compatible with Autonomous Driving Data Framework (ADDF) Modules and can be used together within the same deployment.