Skip to content

A recipe that will walk you through using either Meta Llama 3.1 405B or GPT-4o deployed on Azure AI to generate a synthetic dataset using UC Berkeley's Gorilla project RAFT method.

License

Notifications You must be signed in to change notification settings

Azure-Samples/raft-distillation-recipe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM Distillation Recipe using UC Berkeley's RAFT on Azure AI Serverless

Generated using DALL-e 3 on Azure AI

This repository is a recipe that will walk you through doing LLM distillation on Azure AI Serverless.

Distillation is a process where a large pre-trained model (often referred to as the "teacher" model) is used to train a smaller, more efficient model (known as the "student" model). The goal is to transfer the knowledge from the teacher to the student, enabling the student to achieve comparable performance while being more resource-efficient.

This recipe can use either OpenAI GPT-4o or Meta Llama 3.1 405B as a teacher model deployed on Azure AI to generate a synthetic dataset using UC Berkeley's Gorilla project RAFT method (see blog post). The synthetically generated dataset will then be used to finetune a student model such as OpenAI GPT-4o-mini or Meta Llama 3.1 8B or another supported model. Finally, we will deploy the fine-tuned model and evaluate its performance compared to a baseline model.

Project Goal: The primary objective of this project is to simplify and automate the process of distilling large language models. The workflows and notebooks are meant to be as hands-free as possible, ensuring that even complex tasks like generating synthetic datasets, fine-tuning models, and deploying them can be accomplished with minimal manual intervention. Whether you’re a beginner or an expert, our focus is on providing a seamless experience that allows you to focus on the results rather than the process.

More about RAFT

Getting started / Provisioning Azure AI infrastructure

The infrastructure for this project is fully provisioned using the Azure Developer CLI (AZD). AZD simplifies the deployment process by automating the setup of all required Azure resources, ensuring that you can get started with minimal configuration. This approach allows you to focus on the core aspects of model distillation and fine-tuning, while AZD handles the complexities of cloud resource management behind the scenes. By leveraging AZD, the project maintains a consistent and reproducible environment, making it easier to collaborate and scale.

The easiest is to open the project in Codespaces (or in VS Code Dev Container locally). It comes with azd included.

Open in GitHub Codespaces

Open in Dev Containers

Setup recording

Here's a recording of setting up the repository once in a Dev Container or Codespaces to get an idea of what it looks like before trying yourself:

asciicast

Login using azd

azd auth login --use-device-code

Create azd environment

This creates a new azd environment and is a pre-requisite to configuring models in the next step.

azd env new

Configure models & region

Configure which models you want to use for teacher, student, embedding and baseline (baseline usually equals student) as well as which region to deploy the project to.

Note: Both OpenAI models and Azure Marketplace models are supported.

If in Codespaces or Dev Container:

configure_models.py

Note: This command will narrow down models you can select as you progress through based on the regions they're available in so as to make sure the region you select at the end of the configuration has all the models available. You'll still have to make sure you have enough quotas in the region you select.

It not, virtual env instructions:
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
./infra/scripts/configure_models.py

Provision the infrastructure

azd up

Note: You won't be asked to which region to deploy as the previous configure_models.py scripts configured the AZD region based on your model and region selection.

Note: Both OpenAI models and Azure Marketplace models are supported. The azd infrastructure code will take care of provisioning the infrastructure required to support either of them.

The post provisioning tests.sh script will run infra integration tests to make sure everything is deployed successfully.

Another post provisioning script, export_env.sh will export the environment variables for the provisioned infrastructure to the generated ./.env.state file.

Bring you own models

The easiest is to provision the infrastructure using azd but you can of course also bring your own models. Just provide environment variables for endpoints of your models in the ./.env manual env file at the root of the project.

Environment variable configuration

Those environment variables are expected by RAFT cli scripts. They are suffixed by the purpose of the model COMPLETION, EMBEDDING, BASELINE, JUDGE followed by either standard OpenAI or Azure OpenAI variable names.

Choose for each model purpose either one of the following API styles:

OpenAI API
Env var name Explanation
COMPLETION_OPENAI_API_KEY API Key for the teacher model
COMPLETION_OPENAI_BASE_URL Base URL for the teacher model
COMPLETION_OPENAI_DEPLOYMENT Deployment name for the teacher model
EMBEDDING_OPENAI_API_KEY API Key for the embedding model
EMBEDDING_OPENAI_BASE_URL Base URL for the embedding model
EMBEDDING_OPENAI_DEPLOYMENT Deployment name for the embedding model
BASELINE_OPENAI_API_KEY API Key for the baseline model
BASELINE_OPENAI_BASE_URL Base URL for the baseline model
BASELINE_OPENAI_DEPLOYMENT Deployment name for the baseline model
JUDGE_OPENAI_API_KEY API Key for the judge model
JUDGE_OPENAI_BASE_URL Base URL for the judge model
JUDGE_OPENAI_DEPLOYMENT Deployment name for the judge model
Azure OpenAI API
Env var name Explanation
COMPLETION_AZURE_OPENAI_API_KEY API Key for the teacher model
COMPLETION_AZURE_OPENAI_ENDPOINT Endpoint for the teacher model
COMPLETION_AZURE_OPENAI_DEPLOYMENT Deployment name for the teacher model
COMPLETION_OPENAI_API_VERSION API Version for the teacher model
EMBEDDING_AZURE_OPENAI_API_KEY API Key for the embedding model
EMBEDDING_AZURE_OPENAI_ENDPOINT Endpoint for the embedding model
EMBEDDING_AZURE_OPENAI_DEPLOYMENT Deployment name for the embedding model
EMBEDDING_OPENAI_API_VERSION API Version for the embedding model
BASELINE_AZURE_OPENAI_API_KEY API Key for the baseline model
BASELINE_AZURE_OPENAI_ENDPOINT Endpoint for the baseline model
BASELINE_AZURE_OPENAI_DEPLOYMENT Deployment name for the baseline model
BASELINE_OPENAI_API_VERSION API Version for the baseline model
JUDGE_AZURE_OPENAI_API_KEY API Key for the judge model
JUDGE_AZURE_OPENAI_ENDPOINT Endpoint for the judge model
JUDGE_AZURE_OPENAI_DEPLOYMENT Deployment name for the judge model
JUDGE_OPENAI_API_VERSION API Version for the judge model

Notebooks

This repository is organized in 2 generic notebooks + 2 OpenAI specific notebooks + Azure MaaS (Model As A Service) specific notebooks, one for each step of the process:

Notebook Azure OpenAI Azure MaaS Explanation
1_gen.ipynb ✔️ ✔️ Generate a finetuning dataset using RAFT
2_finetune.ipynb ✔️ Fine tune a base model using the generated dataset
2_finetune_oai.ipynb ✔️ Fine tune a base model using the generated dataset
3_deploy.ipynb ✔️ Deploy the fine tuned model
3_deploy_oai.ipynb ✔️ Deploy the fine tuned model
4_eval.ipynb ✔️ ✔️ Evaluate the fine tuned model

Run time and costs

Warning: The times and costs mentioned bellow are indications to give you a sense of what to expect but can vary dramatically depending on your experience, please monitor your usage to avoid surprises.

Notebook Run time Cost
1_gen.ipynb From 5 minutes for the sample to multiple days for bigger domains From $1 for the sample to $50 or more for bigger domains
2_finetune[_oai].ipynb Roughly 1.5 hours Roughly $50
3_deploy[_oai].ipynb < 10 minutes < $1
4_eval.ipynb From 5 minutes for the sample to multiple days for bigger domains From $1 for the sample to $50 or more for bigger domains

Dormant infrastructure costs

While not used, the infrastructure of this project won't cost much but will still cost a bit.

TODO: provide costs estimations for dormant infra

Configuration files

File Explanation
.env User provided environment variables read by notebooks and scripts
.env.state Environment variables for resources created during notebooks execution and shared by all notebooks
config.json Configuration necessary to connect to the Azure AI Studio Hub (same as Azure ML Workspace)

Parameterized execution

In addition to executing notebooks interactively, the notebooks also support parameterized command line execution using papermill.

Parameter files

The parameter files are contained in folder parameters and support the following configurations:

Parameter file Model Format
Llama-2-7b.yaml Llama-2-7b Completion
Meta-Llama-3-8B-Instruct.yaml Meta-Llama-3-8B-Instruct Chat
Meta-Llama-3.1-8B-Instruct.yaml Meta-Llama-3.1-8B-Instruct Chat

Running notebooks from the command line with a parameter file

Notebooks can be run all at once with a given parameter file using the following command:

./run_all.sh -p ./parameters/Meta-Llama-3.1-8B-Instruct.yaml

Taking down the infrastructure

After you are done working with the project, you can take down the infrastructure with the following command.

IMPORTANT: Please be aware that this will DELETE everything related to this project including generated datasets and fine-tuned models.

IMPORTANT: Save everything important to you before running this command.

azd down --purge

Note: The --purge parameter is important to reclaim quotas, for example for Azure OpenAI embedding models.

About

A recipe that will walk you through using either Meta Llama 3.1 405B or GPT-4o deployed on Azure AI to generate a synthetic dataset using UC Berkeley's Gorilla project RAFT method.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published