v0.4.0: Mixtral-8x7B, DPO-ftx, AutoGPTQ Integration
🚨🚨 Core refactor
- Deprecate
checkpoint_dir
and useadapter_name_or_path
instead - Replace
resume_lora_training
withcreate_new_adapter
- Move the patches in model loading to
llmtuner.model.patcher
- Bump to Transformers 4.36.1 to adapt to the Mixtral models
- Wide adaptation for FlashAttention2 (LLaMA, Falcon, Mistral)
- Temporarily disable LongLoRA due to breaking changes, which will be supported later
The above changes were made by @hiyouga in #1864
New features
- Add DPO-ftx: mixing fine-tuning gradients to DPO via the
dpo_ftx
argument, suggested by @lylcst in #1347 (comment) - Integrate AutoGPTQ into the model export via the
export_quantization_bit
andexport_quantization_dataset
arguments - Support loading datasets from ModelScope Hub by @tastelikefeet and @wangxingjun778 in #1802
- Support resizing token embeddings with the noisy mean initialization by @hiyouga in a66186b
- Support system column in both alpaca and sharegpt dataset formats
New models
- Base models
- Mixtral-8x7B-v0.1
- Instruct/Chat models
- Mixtral-8x7B-v0.1-instruct
- Mistral-7B-Instruct-v0.2
- XVERSE-65B-Chat
- Yi-6B-Chat