Does 4bit support offloading yet? #370

ye7iaserag · 2023-03-17T02:15:52Z

ye7iaserag
Mar 17, 2023

Want to try llama 30b 4bit and there is no way to load that on 3080ti without offloading, I noticed we just got 8bit offloading support, but does that also work with 4bit quantized models?

Answered by BetaDoggo

Mar 23, 2023

It's now possible as of 7618f3f using the --gptq-pre-layer <number of layers> argument.

View full answer

BetaDoggo · 2023-03-23T16:54:02Z

BetaDoggo
Mar 23, 2023

It's now possible as of 7618f3f using the --gptq-pre-layer <number of layers> argument.

1 reply

ye7iaserag Mar 23, 2023
Author

Yep responded to it here #177 (comment)
Thanks a lot

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does 4bit support offloading yet? #370

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Does 4bit support offloading yet? #370

ye7iaserag Mar 17, 2023

Replies: 1 comment · 1 reply

BetaDoggo Mar 23, 2023

ye7iaserag Mar 23, 2023 Author

ye7iaserag
Mar 17, 2023

Replies: 1 comment 1 reply

BetaDoggo
Mar 23, 2023

ye7iaserag Mar 23, 2023
Author