You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The GPTQ-for-LLaMa project quantises weights into 4-bit, reducing VRAM usage for inference and allowing us to use big weights (e.g. LLaMa 30b) on consumer GPUs.
However, it doesn't seem to be compatible with peft.
GPTQ converts (packs) all the layers from Linear to QuantLinear so the _find_and_replace function (here in lora.py) crashes with the following error:
UnboundLocalError: local variable 'new_module' referenced before assignment
Can support be added please?
I did look through the code in both projects but can't figure out a good approach. So no PR from me, sorry.
The text was updated successfully, but these errors were encountered:
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
The GPTQ-for-LLaMa project quantises weights into 4-bit, reducing VRAM usage for inference and allowing us to use big weights (e.g. LLaMa 30b) on consumer GPUs.
However, it doesn't seem to be compatible with
peft
.GPTQ converts (packs) all the layers from
Linear
toQuantLinear
so the_find_and_replace
function (here inlora.py
) crashes with the following error:Can support be added please?
I did look through the code in both projects but can't figure out a good approach. So no PR from me, sorry.
The text was updated successfully, but these errors were encountered: