[Feature request] Support for using LoRA with GPTQ 4bit weights #198

wywywywy · 2023-03-20T14:50:53Z

The GPTQ-for-LLaMa project quantises weights into 4-bit, reducing VRAM usage for inference and allowing us to use big weights (e.g. LLaMa 30b) on consumer GPUs.

However, it doesn't seem to be compatible with peft.

GPTQ converts (packs) all the layers from Linear to QuantLinear so the _find_and_replace function (here in lora.py) crashes with the following error:

UnboundLocalError: local variable 'new_module' referenced before assignment

Can support be added please?

I did look through the code in both projects but can't figure out a good approach. So no PR from me, sorry.

The text was updated successfully, but these errors were encountered:

oobabooga · 2023-03-24T19:44:09Z

+1 for this

github-actions · 2023-04-20T15:03:26Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

github-actions bot closed this as completed Apr 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature request] Support for using LoRA with GPTQ 4bit weights #198

[Feature request] Support for using LoRA with GPTQ 4bit weights #198

wywywywy commented Mar 20, 2023

oobabooga commented Mar 24, 2023

github-actions bot commented Apr 20, 2023

[Feature request] Support for using LoRA with GPTQ 4bit weights #198

[Feature request] Support for using LoRA with GPTQ 4bit weights #198

Comments

wywywywy commented Mar 20, 2023

oobabooga commented Mar 24, 2023

github-actions bot commented Apr 20, 2023