Generated using DALL-e 3 on Azure AI
This repository is a recipe that will walk you through doing LLM distillation on Azure AI Serverless.
Distillation is a process where a large pre-trained model (often referred to as the "teacher" model) is used to train a smaller, more efficient model (known as the "student" model). The goal is to transfer the knowledge from the teacher to the student, enabling the student to achieve comparable performance while being more resource-efficient.
This recipe can use either OpenAI GPT-4o or Meta Llama 3.1 405B as a teacher model deployed on Azure AI to generate a synthetic dataset using UC Berkeley's Gorilla project RAFT method (see blog post). The synthetically generated dataset will then be used to finetune a student model such as OpenAI GPT-4o-mini or Meta Llama 3.1 8B or another supported model. Finally, we will deploy the fine-tuned model and evaluate its performance compared to a baseline model.
Project Goal: The primary objective of this project is to simplify and automate the process of distilling large language models. The workflows and notebooks are meant to be as hands-free as possible, ensuring that even complex tasks like generating synthetic datasets, fine-tuning models, and deploying them can be accomplished with minimal manual intervention. Whether you’re a beginner or an expert, our focus is on providing a seamless experience that allows you to focus on the results rather than the process.
- Microsoft/Meta Blog post: RAFT: A new way to teach LLMs to be better at RAG
- Paper: RAFT: Adapting Language Model to Domain Specific RAG
- UC Berkeley blog post: RAFT: Adapting Language Model to Domain Specific RAG
- Meta blog post: RAFT: Sailing Llama towards better domain-specific RAG
- Gorilla project home: Large Language Model Connected with Massive APIs
- RAFT Github project
The infrastructure for this project is fully provisioned using the Azure Developer CLI (AZD). AZD simplifies the deployment process by automating the setup of all required Azure resources, ensuring that you can get started with minimal configuration. This approach allows you to focus on the core aspects of model distillation and fine-tuning, while AZD handles the complexities of cloud resource management behind the scenes. By leveraging AZD, the project maintains a consistent and reproducible environment, making it easier to collaborate and scale.
The easiest is to open the project in Codespaces (or in VS Code Dev Container locally). It comes with azd included.
Here's a recording of setting up the repository once in a Dev Container or Codespaces to get an idea of what it looks like before trying yourself:
azd auth login --use-device-code
This creates a new azd environment and is a pre-requisite to configuring models in the next step.
azd env new
Configure which models you want to use for teacher
, student
, embedding
and baseline
(baseline
usually equals student
) as well as which region to deploy the project to.
Note: Both OpenAI models and Azure Marketplace models are supported.
If in Codespaces or Dev Container:
configure_models.py
Note: This command will narrow down models you can select as you progress through based on the regions they're available in so as to make sure the region you select at the end of the configuration has all the models available. You'll still have to make sure you have enough quotas in the region you select.
It not, virtual env instructions:
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
./infra/scripts/configure_models.py
azd up
Note: You won't be asked to which region to deploy as the previous
configure_models.py
scripts configured the AZD region based on your model and region selection.
Note: Both OpenAI models and Azure Marketplace models are supported. The azd infrastructure code will take care of provisioning the infrastructure required to support either of them.
The post provisioning tests.sh script will run infra integration tests to make sure everything is deployed successfully.
Another post provisioning script, export_env.sh will export the environment variables for the provisioned infrastructure to the generated ./.env.state file.
The easiest is to provision the infrastructure using azd but you can of course also bring your own models. Just provide environment variables for endpoints of your models in the ./.env manual env file at the root of the project.
Environment variable configuration
Those environment variables are expected by RAFT cli scripts. They are suffixed by the purpose of the model COMPLETION
, EMBEDDING
, BASELINE
, JUDGE
followed by either standard OpenAI or Azure OpenAI variable names.
Choose for each model purpose either one of the following API styles:
OpenAI API
Env var name | Explanation |
---|---|
COMPLETION_OPENAI_API_KEY |
API Key for the teacher model |
COMPLETION_OPENAI_BASE_URL |
Base URL for the teacher model |
COMPLETION_OPENAI_DEPLOYMENT |
Deployment name for the teacher model |
EMBEDDING_OPENAI_API_KEY |
API Key for the embedding model |
EMBEDDING_OPENAI_BASE_URL |
Base URL for the embedding model |
EMBEDDING_OPENAI_DEPLOYMENT |
Deployment name for the embedding model |
BASELINE_OPENAI_API_KEY |
API Key for the baseline model |
BASELINE_OPENAI_BASE_URL |
Base URL for the baseline model |
BASELINE_OPENAI_DEPLOYMENT |
Deployment name for the baseline model |
JUDGE_OPENAI_API_KEY |
API Key for the judge model |
JUDGE_OPENAI_BASE_URL |
Base URL for the judge model |
JUDGE_OPENAI_DEPLOYMENT |
Deployment name for the judge model |
Azure OpenAI API
Env var name | Explanation |
---|---|
COMPLETION_AZURE_OPENAI_API_KEY |
API Key for the teacher model |
COMPLETION_AZURE_OPENAI_ENDPOINT |
Endpoint for the teacher model |
COMPLETION_AZURE_OPENAI_DEPLOYMENT |
Deployment name for the teacher model |
COMPLETION_OPENAI_API_VERSION |
API Version for the teacher model |
EMBEDDING_AZURE_OPENAI_API_KEY |
API Key for the embedding model |
EMBEDDING_AZURE_OPENAI_ENDPOINT |
Endpoint for the embedding model |
EMBEDDING_AZURE_OPENAI_DEPLOYMENT |
Deployment name for the embedding model |
EMBEDDING_OPENAI_API_VERSION |
API Version for the embedding model |
BASELINE_AZURE_OPENAI_API_KEY |
API Key for the baseline model |
BASELINE_AZURE_OPENAI_ENDPOINT |
Endpoint for the baseline model |
BASELINE_AZURE_OPENAI_DEPLOYMENT |
Deployment name for the baseline model |
BASELINE_OPENAI_API_VERSION |
API Version for the baseline model |
JUDGE_AZURE_OPENAI_API_KEY |
API Key for the judge model |
JUDGE_AZURE_OPENAI_ENDPOINT |
Endpoint for the judge model |
JUDGE_AZURE_OPENAI_DEPLOYMENT |
Deployment name for the judge model |
JUDGE_OPENAI_API_VERSION |
API Version for the judge model |
This repository is organized in 2 generic notebooks + 2 OpenAI specific notebooks + Azure MaaS (Model As A Service) specific notebooks, one for each step of the process:
Notebook | Azure OpenAI | Azure MaaS | Explanation |
---|---|---|---|
1_gen.ipynb | ✔️ | ✔️ | Generate a finetuning dataset using RAFT |
2_finetune.ipynb | ✔️ | Fine tune a base model using the generated dataset | |
2_finetune_oai.ipynb | ✔️ | Fine tune a base model using the generated dataset | |
3_deploy.ipynb | ✔️ | Deploy the fine tuned model | |
3_deploy_oai.ipynb | ✔️ | Deploy the fine tuned model | |
4_eval.ipynb | ✔️ | ✔️ | Evaluate the fine tuned model |
Warning: The times and costs mentioned bellow are indications to give you a sense of what to expect but can vary dramatically depending on your experience, please monitor your usage to avoid surprises.
Notebook | Run time | Cost |
---|---|---|
1_gen.ipynb | From 5 minutes for the sample to multiple days for bigger domains | From $1 for the sample to $50 or more for bigger domains |
2_finetune[_oai].ipynb | Roughly 1.5 hours | Roughly $50 |
3_deploy[_oai].ipynb | < 10 minutes | < $1 |
4_eval.ipynb | From 5 minutes for the sample to multiple days for bigger domains | From $1 for the sample to $50 or more for bigger domains |
While not used, the infrastructure of this project won't cost much but will still cost a bit.
TODO: provide costs estimations for dormant infra
File | Explanation |
---|---|
.env | User provided environment variables read by notebooks and scripts |
.env.state | Environment variables for resources created during notebooks execution and shared by all notebooks |
config.json | Configuration necessary to connect to the Azure AI Studio Hub (same as Azure ML Workspace) |
In addition to executing notebooks interactively, the notebooks also support parameterized command line execution using papermill.
The parameter files are contained in folder parameters and support the following configurations:
Parameter file | Model | Format |
---|---|---|
Llama-2-7b.yaml | Llama-2-7b | Completion |
Meta-Llama-3-8B-Instruct.yaml | Meta-Llama-3-8B-Instruct | Chat |
Meta-Llama-3.1-8B-Instruct.yaml | Meta-Llama-3.1-8B-Instruct | Chat |
Notebooks can be run all at once with a given parameter file using the following command:
./run_all.sh -p ./parameters/Meta-Llama-3.1-8B-Instruct.yaml
After you are done working with the project, you can take down the infrastructure with the following command.
IMPORTANT: Please be aware that this will DELETE everything related to this project including generated datasets and fine-tuned models.
IMPORTANT: Save everything important to you before running this command.
azd down --purge
Note: The --purge
parameter is important to reclaim quotas, for example for Azure OpenAI embedding models.