[Feature] Add BAdam algorithm #3287

Ledzy · 2024-04-15T17:29:06Z

What does this PR do?

This PR incorporates the BAdam algorithm to the repository. One can now use BAdam by setting argument "--use_badam". It enables the full parameter finetuning of Llama 2-7b within 24GB RAM under mixed precision training, while using only half time compared with LoRA.

Paper link: https://arxiv.org/abs/2404.02827.
Code repo: https://github.com/Ledzy/BAdam.

A sample running script is available at "examples/extras/badam/sft.sh". The script attains 0.8591 eval_loss after 900 steps within training time of 3 hours (including evaluation time) on a single RTX-3090. Note that one needs to run pip install badam before running the script.

Notes

The gradient_checkpointing_enable in "llmtuner/model/patcher.py" is modified to check if any parameter of the checkpointed layer is trainable. If so, it sets the input's requires_grad to be True. The modification enables checkpointed layer to be trainable when the input's requires_grad is False. Such a modification is essential for the acceleration of BAdam (otherwise, the backward will always go to the first layer, due to model.enable_input_require_grads(), which reduces the training time by half.
BAdam uses mixed precision training, where the fp32 master weight are created during the creation and update of the optimizer; the model should be loaded in fp16 format. The file "llmtuner/model/adapter.py" is modified accordingly.
Currently, BAdam's implementation doesn't support distributed training. The DDP and ZeRO implementation is scheduled to be support in future versions of package badam.

Before submitting

Did you read the contributor guideline?

hiyouga

thanks for adding this brilliant algorithm to LLaMA Factory, please take a look at my comments, especially the implementation of gradient checkpointing.

examples/extras/badam/sft.sh

requirements.txt

src/llmtuner/model/patcher.py

src/llmtuner/model/utils.py

src/llmtuner/train/sft/trainer.py

hiyouga · 2024-04-16T09:32:00Z

Some necessary changes are made, it can be merged now

leekum2018 · 2024-04-26T16:39:11Z

How to select Badam in the web interface, instead of the command line?

Feature BAdam

06c8908

hiyouga added the pending This problem is yet to be addressed label Apr 15, 2024

hiyouga self-requested a review April 15, 2024 17:38

hiyouga requested changes Apr 15, 2024

View reviewed changes

Ledzy added 2 commits April 16, 2024 12:05

resolve gradient checkpointing issue.

7ecb618

remove badam from core requirements

8690021

Ledzy requested a review from hiyouga April 16, 2024 04:27

hiyouga approved these changes Apr 16, 2024

View reviewed changes

hiyouga added 11 commits April 16, 2024 17:10

Update setup.py

f4b4a26

Update requirements.txt

50ced25

Update sft.sh

57dcd91

Update finetuning_args.py

ec899cc

Update parser.py

5b59ff4

Update parser.py

4660703

Update adapter.py

750cdf2

Update patcher.py

a950f3b

Update utils.py

38a5670

Update trainer.py

6700a1b

Update utils.py

c9828f4

hiyouga approved these changes Apr 16, 2024

View reviewed changes

hiyouga merged commit 4d660c5 into hiyouga:main Apr 16, 2024
1 check passed

hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Apr 16, 2024

Theodotus1243 mentioned this pull request Apr 18, 2024

Badam support huggingface/transformers#30308

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Add BAdam algorithm #3287

[Feature] Add BAdam algorithm #3287

Ledzy commented Apr 15, 2024 •

edited

Loading

hiyouga left a comment

hiyouga commented Apr 16, 2024

leekum2018 commented Apr 26, 2024

[Feature] Add BAdam algorithm #3287

[Feature] Add BAdam algorithm #3287

Conversation

Ledzy commented Apr 15, 2024 • edited Loading

What does this PR do?

Notes

Before submitting

hiyouga left a comment

Choose a reason for hiding this comment

hiyouga commented Apr 16, 2024

leekum2018 commented Apr 26, 2024

Ledzy commented Apr 15, 2024 •

edited

Loading