Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Support for using LoRA with GPTQ 4bit weights #198

Closed
wywywywy opened this issue Mar 20, 2023 · 2 comments
Closed

[Feature request] Support for using LoRA with GPTQ 4bit weights #198

wywywywy opened this issue Mar 20, 2023 · 2 comments

Comments

@wywywywy
Copy link

The GPTQ-for-LLaMa project quantises weights into 4-bit, reducing VRAM usage for inference and allowing us to use big weights (e.g. LLaMa 30b) on consumer GPUs.

However, it doesn't seem to be compatible with peft.

GPTQ converts (packs) all the layers from Linear to QuantLinear so the _find_and_replace function (here in lora.py) crashes with the following error:

UnboundLocalError: local variable 'new_module' referenced before assignment

Can support be added please?

I did look through the code in both projects but can't figure out a good approach. So no PR from me, sorry.

@oobabooga
Copy link

+1 for this

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants