LLM Distillation Recipe using UC Berkeley's RAFT on Azure AI Serverless

Generated using DALL-e 3 on Azure AI

This repository is a recipe that will walk you through doing LLM distillation on Azure AI Serverless.

Distillation is a process where a large pre-trained model (often referred to as the "teacher" model) is used to train a smaller, more efficient model (known as the "student" model). The goal is to transfer the knowledge from the teacher to the student, enabling the student to achieve comparable performance while being more resource-efficient.

This recipe can use either OpenAI GPT-4o or Meta Llama 3.1 405B as a teacher model deployed on Azure AI to generate a synthetic dataset using UC Berkeley's Gorilla project RAFT method (see blog post). The synthetically generated dataset will then be used to finetune a student model such as OpenAI GPT-4o-mini or Meta Llama 3.1 8B or another supported model. Finally, we will deploy the fine-tuned model and evaluate its performance compared to a baseline model.

Project Goal: The primary objective of this project is to simplify and automate the process of distilling large language models. The workflows and notebooks are meant to be as hands-free as possible, ensuring that even complex tasks like generating synthetic datasets, fine-tuning models, and deploying them can be accomplished with minimal manual intervention. Whether you’re a beginner or an expert, our focus is on providing a seamless experience that allows you to focus on the results rather than the process.

More about RAFT

Microsoft/Meta Blog post: RAFT: A new way to teach LLMs to be better at RAG
Paper: RAFT: Adapting Language Model to Domain Specific RAG
UC Berkeley blog post: RAFT: Adapting Language Model to Domain Specific RAG
Meta blog post: RAFT: Sailing Llama towards better domain-specific RAG
Gorilla project home: Large Language Model Connected with Massive APIs
RAFT Github project

Getting started / Provisioning Azure AI infrastructure

The infrastructure for this project is fully provisioned using the Azure Developer CLI (AZD). AZD simplifies the deployment process by automating the setup of all required Azure resources, ensuring that you can get started with minimal configuration. This approach allows you to focus on the core aspects of model distillation and fine-tuning, while AZD handles the complexities of cloud resource management behind the scenes. By leveraging AZD, the project maintains a consistent and reproducible environment, making it easier to collaborate and scale.

The easiest is to open the project in Codespaces (or in VS Code Dev Container locally). It comes with azd included.

Setup recording

Here's a recording of setting up the repository once in a Dev Container or Codespaces to get an idea of what it looks like before trying yourself:

Login using azd

azd auth login --use-device-code

Create azd environment

This creates a new azd environment and is a pre-requisite to configuring models in the next step.

azd env new

Configure models & region

Configure which models you want to use for teacher, student, embedding and baseline (baseline usually equals student) as well as which region to deploy the project to.

Note: Both OpenAI models and Azure Marketplace models are supported.

If in Codespaces or Dev Container:

configure_models.py

Note: This command will narrow down models you can select as you progress through based on the regions they're available in so as to make sure the region you select at the end of the configuration has all the models available. You'll still have to make sure you have enough quotas in the region you select.

It not, virtual env instructions:

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
./infra/scripts/configure_models.py

Provision the infrastructure

azd up

Note: You won't be asked to which region to deploy as the previous configure_models.py scripts configured the AZD region based on your model and region selection.

Note: Both OpenAI models and Azure Marketplace models are supported. The azd infrastructure code will take care of provisioning the infrastructure required to support either of them.

The post provisioning tests.sh script will run infra integration tests to make sure everything is deployed successfully.

Another post provisioning script, export_env.sh will export the environment variables for the provisioned infrastructure to the generated ./.env.state file.

Bring you own models

The easiest is to provision the infrastructure using azd but you can of course also bring your own models. Just provide environment variables for endpoints of your models in the ./.env manual env file at the root of the project.

Environment variable configuration

Those environment variables are expected by RAFT cli scripts. They are suffixed by the purpose of the model COMPLETION, EMBEDDING, BASELINE, JUDGE followed by either standard OpenAI or Azure OpenAI variable names.

Choose for each model purpose either one of the following API styles:

OpenAI API

Env var name	Explanation
`COMPLETION_OPENAI_API_KEY`	API Key for the teacher model
`COMPLETION_OPENAI_BASE_URL`	Base URL for the teacher model
`COMPLETION_OPENAI_DEPLOYMENT`	Deployment name for the teacher model
`EMBEDDING_OPENAI_API_KEY`	API Key for the embedding model
`EMBEDDING_OPENAI_BASE_URL`	Base URL for the embedding model
`EMBEDDING_OPENAI_DEPLOYMENT`	Deployment name for the embedding model
`BASELINE_OPENAI_API_KEY`	API Key for the baseline model
`BASELINE_OPENAI_BASE_URL`	Base URL for the baseline model
`BASELINE_OPENAI_DEPLOYMENT`	Deployment name for the baseline model
`JUDGE_OPENAI_API_KEY`	API Key for the judge model
`JUDGE_OPENAI_BASE_URL`	Base URL for the judge model
`JUDGE_OPENAI_DEPLOYMENT`	Deployment name for the judge model

Azure OpenAI API

Env var name	Explanation
`COMPLETION_AZURE_OPENAI_API_KEY`	API Key for the teacher model
`COMPLETION_AZURE_OPENAI_ENDPOINT`	Endpoint for the teacher model
`COMPLETION_AZURE_OPENAI_DEPLOYMENT`	Deployment name for the teacher model
`COMPLETION_OPENAI_API_VERSION`	API Version for the teacher model
`EMBEDDING_AZURE_OPENAI_API_KEY`	API Key for the embedding model
`EMBEDDING_AZURE_OPENAI_ENDPOINT`	Endpoint for the embedding model
`EMBEDDING_AZURE_OPENAI_DEPLOYMENT`	Deployment name for the embedding model
`EMBEDDING_OPENAI_API_VERSION`	API Version for the embedding model
`BASELINE_AZURE_OPENAI_API_KEY`	API Key for the baseline model
`BASELINE_AZURE_OPENAI_ENDPOINT`	Endpoint for the baseline model
`BASELINE_AZURE_OPENAI_DEPLOYMENT`	Deployment name for the baseline model
`BASELINE_OPENAI_API_VERSION`	API Version for the baseline model
`JUDGE_AZURE_OPENAI_API_KEY`	API Key for the judge model
`JUDGE_AZURE_OPENAI_ENDPOINT`	Endpoint for the judge model
`JUDGE_AZURE_OPENAI_DEPLOYMENT`	Deployment name for the judge model
`JUDGE_OPENAI_API_VERSION`	API Version for the judge model

Notebooks

This repository is organized in 2 generic notebooks + 2 OpenAI specific notebooks + Azure MaaS (Model As A Service) specific notebooks, one for each step of the process:

Notebook	Azure OpenAI	Azure MaaS	Explanation
1_gen.ipynb	✔️	✔️	Generate a finetuning dataset using RAFT
2_finetune.ipynb		✔️	Fine tune a base model using the generated dataset
2_finetune_oai.ipynb	✔️		Fine tune a base model using the generated dataset
3_deploy.ipynb		✔️	Deploy the fine tuned model
3_deploy_oai.ipynb	✔️		Deploy the fine tuned model
4_eval.ipynb	✔️	✔️	Evaluate the fine tuned model

Run time and costs

Warning: The times and costs mentioned bellow are indications to give you a sense of what to expect but can vary dramatically depending on your experience, please monitor your usage to avoid surprises.

Notebook	Run time	Cost
1_gen.ipynb	From 5 minutes for the sample to multiple days for bigger domains	From $1 for the sample to $50 or more for bigger domains
2_finetune[_oai].ipynb	Roughly 1.5 hours	Roughly $50
3_deploy[_oai].ipynb	< 10 minutes	< $1
4_eval.ipynb	From 5 minutes for the sample to multiple days for bigger domains	From $1 for the sample to $50 or more for bigger domains

Dormant infrastructure costs

While not used, the infrastructure of this project won't cost much but will still cost a bit.

TODO: provide costs estimations for dormant infra

Configuration files

File	Explanation
.env	User provided environment variables read by notebooks and scripts
.env.state	Environment variables for resources created during notebooks execution and shared by all notebooks
config.json	Configuration necessary to connect to the Azure AI Studio Hub (same as Azure ML Workspace)

Parameterized execution

In addition to executing notebooks interactively, the notebooks also support parameterized command line execution using papermill.

Parameter files

The parameter files are contained in folder parameters and support the following configurations:

Parameter file	Model	Format
Llama-2-7b.yaml	Llama-2-7b	Completion
Meta-Llama-3-8B-Instruct.yaml	Meta-Llama-3-8B-Instruct	Chat
Meta-Llama-3.1-8B-Instruct.yaml	Meta-Llama-3.1-8B-Instruct	Chat

Running notebooks from the command line with a parameter file

Notebooks can be run all at once with a given parameter file using the following command:

./run_all.sh -p ./parameters/Meta-Llama-3.1-8B-Instruct.yaml

Taking down the infrastructure

After you are done working with the project, you can take down the infrastructure with the following command.

IMPORTANT: Please be aware that this will DELETE everything related to this project including generated datasets and fine-tuned models.

IMPORTANT: Save everything important to you before running this command.

azd down --purge

Note: The --purge parameter is important to reclaim quotas, for example for Azure OpenAI embedding models.

Name		Name	Last commit message	Last commit date
Latest commit History 206 Commits
.devcontainer		.devcontainer
.github		.github
.vscode		.vscode
doc		doc
infra		infra
parameters		parameters
sample_data		sample_data
.dockerignore		.dockerignore
.env.sample		.env.sample
.gitignore		.gitignore
1_gen.ipynb		1_gen.ipynb
2_finetune.ipynb		2_finetune.ipynb
2_finetune_oai.ipynb		2_finetune_oai.ipynb
3_deploy.ipynb		3_deploy.ipynb
3_deploy_oai.ipynb		3_deploy_oai.ipynb
4_eval.ipynb		4_eval.ipynb
LICENSE.md		LICENSE.md
README.md		README.md
azure.yaml		azure.yaml
config.json.sample		config.json.sample
pyproject.toml		pyproject.toml
raft-process-eval.png		raft-process-eval.png
requirements.txt		requirements.txt
run_all.sh		run_all.sh
setup_raft.sh		setup_raft.sh
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Distillation Recipe using UC Berkeley's RAFT on Azure AI Serverless

More about RAFT

Getting started / Provisioning Azure AI infrastructure

Setup recording

Login using azd

Create azd environment

Configure models & region

Provision the infrastructure

Bring you own models

Notebooks

Run time and costs

Dormant infrastructure costs

Configuration files

Parameterized execution

Parameter files

Running notebooks from the command line with a parameter file

Taking down the infrastructure

About

Releases

Packages

Contributors 2

Languages

License

Azure-Samples/raft-distillation-recipe

Folders and files

Latest commit

History

Repository files navigation

LLM Distillation Recipe using UC Berkeley's RAFT on Azure AI Serverless

More about RAFT

Getting started / Provisioning Azure AI infrastructure

Setup recording

Login using azd

Create azd environment

Configure models & region

Provision the infrastructure

Bring you own models

Notebooks

Run time and costs

Dormant infrastructure costs

Configuration files

Parameterized execution

Parameter files

Running notebooks from the command line with a parameter file

Taking down the infrastructure

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages