Does 4bit support offloading yet? #370
Answered
by
BetaDoggo
ye7iaserag
asked this question in
Q&A
-
Want to try llama 30b 4bit and there is no way to load that on 3080ti without offloading, I noticed we just got 8bit offloading support, but does that also work with 4bit quantized models? |
Beta Was this translation helpful? Give feedback.
Answered by
BetaDoggo
Mar 23, 2023
Replies: 1 comment 1 reply
-
It's now possible as of |
Beta Was this translation helpful? Give feedback.
1 reply
Answer selected by
ye7iaserag
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
It's now possible as of
7618f3f
using the--gptq-pre-layer <number of layers>
argument.