Zero to LitGPT: Getting Started with Pretraining, Finetuning, and Using LLMs

This tutorial walks you through the main features and usage patterns for ⚡️LitGPT, a library for pretraining, finetuning, and using LLMs that focuses on an efficient user experience while being developer-friendly.

The topics, following the installation of LitGPT, are in chronological order, reflecting the steps in an LLM lifecycle: Pretraining → Finetuning → Inference.

However, it is also possible, and even common, to use and deploy models with LitGPT without pretraining and finetuning. So, if you are not interested in pretraining and finetuning, please feel free to skip these sections.

Install LitGPT

LitGPT is available as a Python library from the PyPI package repository, and we recommend installing it using Python's pip installer module, including all required package dependencies:

pip install 'litgpt[all]'

Alternatively, if you are a researcher or developer planning to make changes to LitGPT, you can clone the GitHub repository and install it from a local folder as follows:

git clone https://github.com/Lightning-AI/litgpt.git
cd litgpt
pip install -e '.[all]'

Pretrain LLMs

Pretraining LLMs requires substantial compute resources and time commitment. For that reason, most researchers and practitioners prefer to skip this step and continue with the Download pretrained model weights section instead.

However, if you feel adventurous and want to pretrain your own LLM, here's how.

First, we have to decide which type of model architecture we want to use. We list the available architectures by using the pretrain command without any additional arguments:

litgpt pretrain list

This prints a list of all available model architectures in alphabetical order:

Camel-Platypus2-13B
Camel-Platypus2-70B
CodeLlama-13b-Python-hf
...
EleutherAI/pythia-410m
...
vicuna-13b-v1.3
vicuna-13b-v1.5
vicuna-13b-v1.5-16k
vicuna-33b-v1.3
vicuna-7b-v1.3
vicuna-7b-v1.5
vicuna-7b-v1.5-16k

Suppose we want to pretraining the 1.1B parameter small tiny-llama-1.1b model. Before starting finetuning, we must also choose and download a tokenizer.

We can download a tokenizer via the download command. Note that running litgpt download list will also print a list of all available models and tokenizers to download.

To filter for specific models, e.g., TinyLlama, we can use the grep command in our terminal:

litgpt download list | grep  TinyLlama

This prints

TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
TinyLlama/TinyLlama-1.1B-Chat-v1.0

Let's now download the tokenizer corresponding to TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T that we can then use to pretrain the TinyLlama model:

litgpt download \
   TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T \
   --tokenizer_only true

(when specif)

Next, we can pretrain the model on the OpenWebText dataset with the default setting as follows:

litgpt pretrain tiny-llama-1.1b \
  --data OpenWebText \
  --tokenizer_dir TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T

If you are interested in additional settings, you can use the help command as follows:

litgpt pretrain --help

Tip

Above, we only covered the most basic commands for pretraining a model using LitGPT. We highly recommend checking the resources below if you are interested in pretraining a model.

More information and additional resources

tutorials/pretrain: General information about pretraining in LitGPT
tutorials/pretrain_tinyllama: A tutorial for finetuning a 1.1B TinyLlama model on 3 trillion tokens
config_hub/pretrain: Pre-made config files for pretraining that work well out of the box
Project templates in reproducible environments with multi-GPU and multi-node support:

Download pretrained model weights

Most practical use cases, like LLM inference (/chat) or finetuning, involve using pretrained model weights. LitGPT supports a large number of model weights, which can be listed by executing the download with list as an argument:

litgpt download list

This will print a (long) list of all supported pretrained models (abbreviated for readability below):

..
google/gemma-2b
...
meta-llama/Llama-2-7b-hf
...
microsoft/phi-2
...
mistralai/Mixtral-8x7B-Instruct-v0.1
...

To download the model weights, provide one of the model strings above as input argument:

litgpt download microsoft/phi-2

model-00001-of-00002.safetensors: 100%|████████████████████████████████| 5.00G/5.00G [00:40<00:00, 124MB/s]
model-00002-of-00002.safetensors: 100%|████████████████████████████████| 564M/564M [00:01<00:00, 330MB/s]
tokenizer.json: 100%|██████████████████████████████████████████████████| 2.11M/2.11M [00:00<00:00, 54.0MB/s]
...
Converting checkpoint files to LitGPT format.
Processing checkpoints/microsoft/phi-2/model-00001-of-00002.bin
...
Saving converted checkpoint to checkpoints/microsoft/phi-2

Tip

Note that some models, such as Llama 2, require that you accept Meta AI's terms of service for this model, and you need to use a special access token via the litgpt download ... --access_token ... option. For more information, visit the respective Model Hub website, e.g., meta-llama/Llama-2-7b-hf. The access token can be created under your Model Hub in the Profile > Access Tokens menu.

By default, the weights are going to be stored in a ./checkpoints subdirectory:

ls -lh checkpoints/microsoft/phi-2/

total 11G
-rw-r--r-- 1 sebastian sebastian  863 Mar 19 21:14 config.json
-rw-r--r-- 1 sebastian sebastian  124 Mar 19 21:14 generation_config.json
-rw-r--r-- 1 sebastian sebastian 5.2G Mar 19 21:15 lit_model.pth
-rw-r--r-- 1 sebastian sebastian 4.7G Mar 19 21:15 model-00001-of-00002.bin
-rw-r--r-- 1 sebastian sebastian 538M Mar 19 21:15 model-00002-of-00002.bin
-rw-r--r-- 1 sebastian sebastian  528 Mar 19 21:15 model_config.yaml
-rw-r--r-- 1 sebastian sebastian 2.1M Mar 19 21:14 tokenizer.json
-rw-r--r-- 1 sebastian sebastian 7.2K Mar 19 21:14 tokenizer_config.json

The model is now ready for inference and chat, for example, using the chat command on the checkpoint directory:

litgpt chat microsoft/phi-2

Now chatting with phi-2.
To exit, press 'Enter' on an empty prompt.

Seed set to 1234
>> Prompt: Why are LLMs so useful?
>> Reply:  When building applications or operating systems, you can use LLMs to know how a computer should respond to your commands. This can make your programs run faster and more efficiently.

Time for inference: 1.26 sec total, 27.81 tokens/sec, 35 tokens

>> Prompt:

Tip

Use --multiline true to support prompts that require multiple input lines.

More information and additional resources

tutorials/download_model_weights: A more comprehensive download tutorial, tips for GPU memory limitations, and more

Finetune LLMs

LitGPT supports several methods of supervised instruction finetuning, which allows you to finetune models to follow instructions.

Datasets for Instruction-finetuning are usually formatted in the following way:

Alternatively, datasets for instruction finetuning can also contain an 'input' field:

In an instruction-finetuning context, "full" finetuning means updating all model parameters as opposed to only a subset. Adapter and LoRA (short for low-rank adaptation) are methods for parameter-efficient finetuning that only require updating a small fraction of the model weights.

Parameter-efficient finetuning is much more resource-efficient and cheaper than full finetuning, and it often results in the same good performance on downstream tasks.

In the following example, we will use LoRA for finetuning, which is one of the most popular LLM finetuning methods. (For more information on how LoRA works, please see Code LoRA from Scratch.)

Before we start, we have to download a model as explained in the previous "Download pretrained model" section above:

litgpt download microsoft/phi-2

The LitGPT interface can be used via command line arguments and configuration files. We recommend starting with the configuration files from the config_hub and either modifying them directly or overriding specific settings via the command line. For example, we can use the following setting to train the downloaded 2.7B parameter microsoft/phi-2 model, where we set --max_steps 5 for a quick test run.

If you have downloaded or cloned the LitGPT repository, you can provide the config file via a relative path:

litgpt finetune_lora microsoft/phi-2\
  --config config_hub/finetune/phi-2/lora.yaml \
  --train.max_steps 5

Alternatively, you can provide a URL:

litgpt finetune_lora microsoft/phi-2\
  --config https://raw.githubusercontent.com/Lightning-AI/litgpt/main/config_hub/finetune/phi-2/lora.yaml \
  --train.max_steps 5

Tip

Note that the config file above will finetune the model on the Alpaca2k dataset on 1 GPU and save the resulting files in an out/finetune/lora-phi-2 directory. All of these settings can be changed via a respective command line argument or by changing the config file. To see more options, execute litgpt finetune_lora --help.

Running the previous finetuning command will initiate the finetuning process, which should only take about a minute on a GPU due to the --train.max_steps 5 setting.

{'checkpoint_dir': PosixPath('checkpoints/microsoft/phi-2'),  # TODO
 'data': Alpaca2k(mask_prompt=False,
                  val_split_fraction=0.03847,
                  prompt_style=<litgpt.prompts.Alpaca object at 0x7f5fa2867e80>,
                  ignore_index=-100,
                  seed=42,
                  num_workers=4,
                  download_dir=PosixPath('data/alpaca2k')),
 'devices': 1,
 'eval': EvalArgs(interval=100, max_new_tokens=100, max_iters=100),
 'logger_name': 'csv',
 'lora_alpha': 16,
 'lora_dropout': 0.05,
 'lora_head': True,
 'lora_key': True,
 'lora_mlp': True,
 'lora_projection': True,
 'lora_query': True,
 'lora_r': 8,
 'lora_value': True,
 'num_nodes': 1,
 'out_dir': PosixPath('out/finetune/lora-phi-2'),
 'precision': 'bf16-true',
 'quantize': None,
 'seed': 1337,
 'train': TrainArgs(save_interval=800,
                    log_interval=1,
                    global_batch_size=8,
                    micro_batch_size=4,
                    lr_warmup_steps=10,
                    epochs=1,
                    max_tokens=None,
                    max_steps=5,
                    max_seq_length=512,
                    tie_embeddings=None,
                    learning_rate=0.0002,
                    weight_decay=0.0,
                    beta1=0.9,
                    beta2=0.95,
                    max_norm=None,
                    min_lr=6e-05)}
Seed set to 1337
Number of trainable parameters: 12,226,560
Number of non-trainable parameters: 2,779,683,840
The longest sequence length in the train data is 512, the model's maximum sequence length is 512 and context length is 2048
Validating ...
Recommend a movie for me to watch during the weekend and explain the reason.
Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Recommend a movie for me to watch during the weekend and explain the reason.

### Response:
I recommend you watch "Parasite" because it's a critically acclaimed movie that won multiple awards, including the Academy Award for Best Picture. It's a thought-provoking and suspenseful film that will keep you on the edge of your seat. The movie also tackles social and economic inequalities, making it a must-watch for anyone interested in meaningful storytelling.

/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchmetrics/utilities/prints.py:43: UserWarning: The ``compute`` method of metric MeanMetric was called before the ``update`` method which may lead to errors, as metric states have not yet been updated.
  warnings.warn(*args, **kwargs)  # noqa: B028
Missing logger folder: out/finetune/lora-phi-2/logs/csv
Epoch 1 | iter 1 step 0 | loss train: 1.646, val: n/a | iter time: 820.31 ms
Epoch 1 | iter 2 step 1 | loss train: 1.660, val: n/a | iter time: 548.72 ms (step)
Epoch 1 | iter 3 step 1 | loss train: 1.687, val: n/a | iter time: 300.07 ms
Epoch 1 | iter 4 step 2 | loss train: 1.597, val: n/a | iter time: 595.27 ms (step)
Epoch 1 | iter 5 step 2 | loss train: 1.640, val: n/a | iter time: 260.75 ms
Epoch 1 | iter 6 step 3 | loss train: 1.703, val: n/a | iter time: 568.22 ms (step)
Epoch 1 | iter 7 step 3 | loss train: 1.678, val: n/a | iter time: 511.70 ms
Epoch 1 | iter 8 step 4 | loss train: 1.741, val: n/a | iter time: 514.14 ms (step)
Epoch 1 | iter 9 step 4 | loss train: 1.689, val: n/a | iter time: 423.59 ms
Epoch 1 | iter 10 step 5 | loss train: 1.524, val: n/a | iter time: 603.03 ms (step)
Training time: 11.20s
Memory used: 13.90 GB
Saving LoRA weights to 'out/finetune/lora-phi-2/final/lit_model.pth.lora'
Saved merged weights to 'out/finetune/lora-phi-2/final/lit_model.pth'

Notice that the LoRA script saves both the LoRA weights ('out/finetune/lora-phi-2/final/lit_model.pth.lora') and the LoRA weight merged back into the original model ('out/finetune/lora-phi-2/final/lit_model.pth') for convenience. This allows us to use the finetuned model via the chat function directly:

litgpt chat out/finetune/lora-phi-2/final/

Now chatting with phi-2.
To exit, press 'Enter' on an empty prompt.

Seed set to 1234
>> Prompt: Why are LLMs so useful?
>> Reply: LLMs are useful because they can be trained to perform various natural language tasks, such as language translation, text generation, and question-answering. They are also able to understand the context of the input data, which makes them particularly useful for tasks such as sentiment analysis and text summarization. Additionally, because LLMs can learn from large amounts of data, they are able to generalize well and perform well on new data.

Time for inference: 2.15 sec total, 39.57 tokens/sec, 85 tokens

>> Prompt:

More information and additional resources

tutorials/prepare_dataset: A summary of all out-of-the-box supported datasets in LitGPT and utilities for preparing custom datasets
tutorials/finetune: An overview of the different finetuning methods supported in LitGPT
tutorials/finetune_full: A tutorial on full-parameter finetuning
tutorials/finetune_lora: Options for parameter-efficient finetuning with LoRA and QLoRA
tutorials/finetune_adapter: A description of the parameter-efficient Llama-Adapter methods supported in LitGPT
tutorials/oom: Tips for dealing with out-of-memory (OOM) errors
config_hub/finetune: Pre-made config files for finetuning that work well out of the box

LLM inference

To use a downloaded or finetuned model for chat, you only need to provide the corresponding checkpoint directory containing the model and tokenizer files. For example, to chat with the phi-2 model from Microsoft, download it as follows, as described in the "Download pretrained model" section:

litgpt download microsoft/phi-2

model-00001-of-00002.safetensors: 100%|████████████████████████████████| 5.00G/5.00G [00:40<00:00, 124MB/s]
model-00002-of-00002.safetensors: 100%|████████████████████████████████| 564M/564M [00:01<00:00, 330MB/s]
tokenizer.json: 100%|██████████████████████████████████████████████████| 2.11M/2.11M [00:00<00:00, 54.0MB/s]
...
Converting checkpoint files to LitGPT format.
Processing checkpoints/microsoft/phi-2/model-00001-of-00002.bin
...
Saving converted checkpoint to checkpoints/microsoft/phi-2

Then, chat with the model using the following command:

litgpt chat microsoft/phi-2

Now chatting with phi-2.
To exit, press 'Enter' on an empty prompt.

Seed set to 1234
>> Prompt: What is the main difference between a large language model and a traditional search engine?
>> Reply:  A large language model uses deep learning algorithms to analyze and generate natural language, while a traditional search engine uses algorithms to retrieve information from web pages.

Time for inference: 1.14 sec total, 26.26 tokens/sec, 30 tokens

Tip

Most model weights are already represented in an efficient bfloat16 format. However, if the model currently exceeds your GPU memory, you can try to pass the --precision bf16-true option. In addition, you can check the quantization documentation for further optimization, which is linked below.

More information and additional resources

tutorials/inference: Chat and inference tutorial
tutorials/quantize: Quantizing models to reduce GPU memory requirements

Using the LitGPT Python API for Inference

The previous section explained how to use the litgpt chat command line interface for inference. Alternatively, LitGPT also offers a Python API approach to generate text using an LLM:

from litgpt import LLM

llm = LLM.load("microsoft/phi-2")
text = llm.generate("What do Llamas eat?", top_k=1, max_new_tokens=30)
print(text)

Note that the if you pass a supported model name to LLM.load(), as shown above, it will download the model from the HF hub if it doesn't exist locally, yet (use litgpt download list on the command line to get a list of all currently supported models.)

Alternatively, to load model from a local path, just provide the corresponding path as input to the load method:

llm = LLM.load("path/to/my/local/checkpoint")

More information and additional resources

tutorials/python-api: The LitGPT Python API documentation

Evaluating models

LitGPT comes with a handy litgpt evaluate command to evaluate models with Eleuther AI's Evaluation Harness. For example, to evaluate the previously downloaded microsoft/phi-2 model on several tasks available from the Evaluation Harness, you can use the following command:

litgpt evaluate microsoft/phi-2
  --batch_size 16 \
  --tasks "hellaswag,gsm8k,truthfulqa_mc2,mmlu,winogrande,arc_challenge"

(A list of supported tasks can be found here.)

Deploy LLMs

You can deploy LitGPT LLMs using your tool of choice. Below is an example using LitGPT built-in serving capabilities:

# 1) Download a pretrained model (alternatively, use your own finetuned model)
litgpt download microsoft/phi-2

# 2) Start the server
litgpt serve microsoft/phi-2

# 3) Use the server (in a separate session)
import requests, json
 response = requests.post(
     "http://127.0.0.1:8000/predict",
     json={"prompt": "Fix typos in the following sentence: Exampel input"}
)
print(response.json()["output"])

This prints:

Instruct: Fix typos in the following sentence: Exampel input
Output: Example input.

More information and additional resources

tutorials/deploy: A full deployment tutorial and example

Converting LitGPT model weights to `safetensors` format

Sometimes, it can be useful to convert LitGPT model weights for third-party and external tools. For example, we can convert a LitGPT model to the Hugging Face format and save it via .safetensors files, which we can do as follows:

litgpt convert_from_litgpt microsoft/phi-2 out/converted_model/

Certain tools like the .from_pretrained method in Hugging Face transformers also require the original config.json file that originally came with the downloaded model:

cp checkpoints/microsoft/phi-2/config.json out/converted_model/config.json

You can now load the model into a Hugging Face transformers model and safe it in a .safetensors format as follows:

import torch
from transformers import AutoModel

# Load model
state_dict = torch.load('out/converted_model/model.pth')
model = AutoModel.from_pretrained(
    "microsoft/phi-2", state_dict=state_dict
)

# Save .safetensors files
model.save_pretrained("out/converted_model/")

⚡ ~/litgpt ls -lh out/converted_model
total 16G
-rwxr--r-- 1 sebastian sebastian  891 Mar 20 17:08 config.json
-rw-r--r-- 1 sebastian sebastian 4.7G Mar 20 17:08 model-00001-of-00003.safetensors
-rw-r--r-- 1 sebastian sebastian 4.7G Mar 20 17:09 model-00002-of-00003.safetensors
-rw-r--r-- 1 sebastian sebastian 601M Mar 20 17:09 model-00003-of-00003.safetensors
-rw-r--r-- 1 sebastian sebastian 5.2G Mar 20 16:30 model.pth
-rw-r--r-- 1 sebastian sebastian  33K Mar 20 17:09 model.safetensors.index.json

You can then use the model with external tools, for example, Eleuther AI's LM Evaluation Harness (see the lm_eval installation instructions here).

The LM Evaluation Harness requires a tokenizer to be present in the model checkpoint folder, which we can copy from the original download checkpoint:

# Copy the tokenizer needed by the Eval Harness
cp checkpoints/microsoft/phi-2/tokenizer*
out/converted_model

Then, we can run the Evaluation Harness as follows:

lm_eval --model hf \
    --model_args pretrained="out/converted_model" \
    --tasks "hellaswag,gsm8k,truthfulqa_mc2,mmlu,winogrande,arc_challenge" \
    --device "cuda:0" \
    --batch_size 4

Tip

The Evaluation Harness tasks above are those used in Open LLM Leaderboard. You can find a list all supported tasks here.

More information and additional resources

tutorials/convert_lit_models: Tutorial on converting LitGPT weights

Get involved!

We appreciate your feedback and contributions. If you have feature requests, questions, or want to contribute code or config files, please don't hesitate to use the GitHub Issue tracker.

We welcome all individual contributors, regardless of their level of experience or hardware. Your contributions are valuable, and we are excited to see what you can accomplish in this collaborative and supportive environment.

Tip

Unsure about contributing? Check out our How to Contribute to LitGPT guide.

If you have general questions about building with LitGPT, please join our Discord.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0_to_litgpt.md

0_to_litgpt.md

Zero to LitGPT: Getting Started with Pretraining, Finetuning, and Using LLMs

Install LitGPT

Pretrain LLMs

Download pretrained model weights

Finetune LLMs

LLM inference

Using the LitGPT Python API for Inference

Evaluating models

Deploy LLMs

Converting LitGPT model weights to `safetensors` format

Get involved!

Files

0_to_litgpt.md

Latest commit

History

0_to_litgpt.md

File metadata and controls

Zero to LitGPT: Getting Started with Pretraining, Finetuning, and Using LLMs

Install LitGPT

Pretrain LLMs

Download pretrained model weights

Finetune LLMs

LLM inference

Using the LitGPT Python API for Inference

Evaluating models

Deploy LLMs

Converting LitGPT model weights to safetensors format

Get involved!

Converting LitGPT model weights to `safetensors` format