Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
First pull request ever! Please be kind :)
I propose an implementation of the LoRA finetuning algorithm. I'm a basic user of pytorch and a total newbie about more advanced libraries on LLMs and I just wanted to practice a bit. Of course I missed the previous implementation #187 otherwise I would have worked on that.
I tried to modify the code as little as possible. I deeply draw inspiration from huggingface's PEFT library, but I tried to implement LoRA with the smallest overhead I could. The implementation is basically a new child class
LoraTransformer
that introduce two new methods to freeze the existing weights and to add LoRA AB layers on a specified set of linear layers.A new member
lora_layer
has been added toModelArgs
to specify the layers LoRA has to be applied to.lora_layers
is a dict that storesr
andalpha
for each layer.Something to take into account:
nn.Embedding
, but I couldn't figure out if it is really used or not. From what I get from the original LoRA paper, most of the speedup is with lowr
LoRA on the most of linear layer one can afford.train.py
to account for the finetuning. I have no clue about learning rate schedule for finetuning, so I tweaked theget_lr
function to restart the counter from 0 when finetuning is going on. I look forward to any hint on how to do it properly.I run
pytest
just for back-compatibility but I did not add any testing for LoRA since I would like a feedback on the implementation doing the right thing first. I'm short on GPU so I ran a quick test on a RTX A6000 (hours are not billed on cluster's login node I've access to :) ) finetuning the 260k model from the checkpoints provided on huggingface with LoRA on all linear layer with r=2, alpha=1 ( i'm also not sure whether this choice of parameters is not among the dumbest) :(skipped the per-iteration print. warmup is 1000, I run it for 2000 it. Other hyperparameters are the default ones)
I look forward to any feedback on this. It has been fun!