ExLlamaV2: exl2 support #3203

pabl-o-ce · 2024-03-05T14:54:03Z

If is possible ExLlamaV2 is a very fast and good library to Run LLM

mgoin · 2024-03-05T18:51:55Z

@pabl-o-ce What specifically would you like to see? vLLM already integrates kernels from exllamav2 - see things such as the GPTQ kernels

vllm/csrc/quantization/gptq/qdq_4.cuh

Line 2 in 05af6da

Copied from https://github.com/turboderp/exllamav2

pabl-o-ce · 2024-03-06T04:43:34Z

Hi @mgoin thanks for the response.

So is possible to use exl2 format on vllm? or is using gptq format
sorry if this is a very n00b question

tobiajung · 2024-03-12T07:00:53Z

Hi,
seems like 0.3.3 has vllm support, but I'm not able to get a model up and running. I use the docker environment with the following args: --model LoneStriker/CodeFuse-DeepSeek-33B-4.0bpw-h6-exl2 --gpu-memory-utilization 0.65 --max-model-len 2048
But seems like vllm tries to allocate much more memory than the given 0.65 (of 48GB) and results in an error.

Is exl2 proper supported? How to start the docker container correctly to operate on exl2 models?

wxupjack · 2024-03-14T11:00:58Z

Hi @mgoin, I think this feature submitted by @chu-tianxiang in #2330 and #916 just utilize the shuffle and dequant functions from exllamav2 repo for GPTQ. But not means vllm(main branch) has been compatible with the dynamic precision exl2 format.

So it still don't support exl2 properly. @tobiajung

sapountzis · 2024-04-02T16:49:09Z

We are also interested in exl2 dynamic precision format. What steps would be needed to support it?

nkeilar · 2024-04-05T20:46:21Z

Support for this would allow dbrx exl2 support for the dbrx model quantized weights. Allowing inference of the model on a dual 24 gpu system.

zminer123 · 2024-04-27T08:46:58Z

Hello there! Has any more thought/attention been given to the idea of exl2 support? The newest derivatives of llama3 (such as dolphin 70b) utilize it and it seems no one else is quantizing it to AWQ or GPTQ. I love vLLM regardless! Thank you guys for all the work you put in.

saucebing · 2024-04-28T02:41:48Z

Hi, we are also interested in EXL2 format, which is quite flexible and fast. As for flexibility, you can use 3.2, 4.5, 8.5 bpw (bit per weight) to quantize a model. And the inference speed of EXL2 is much faster than GPTQ in 8-bit precision.

houmie · 2024-04-28T22:42:17Z

Fully agree. Supporting exl2 would be perfect!

belladoreai · 2024-05-13T16:18:41Z

I would also love to see exl2 support

mku-wedoai · 2024-05-20T12:58:02Z

I would love to see EXL2 support in vLLM!

DenisSergeevitch · 2024-05-22T13:09:04Z

exl2 is needed feature 100%

sparsh35 · 2024-05-23T23:56:17Z

I support it too

chopin1998 · 2024-06-03T09:04:18Z

any update?
really hope to get perfect support for exl2 format ASAP.

meditans · 2024-06-09T17:28:29Z

Also voicing my support!

kulievvitaly · 2024-06-11T07:46:08Z

VLLM is great. Like to see exlv2 support too!

Respaired · 2024-07-13T01:59:36Z

yeah this would be great to have

paulb-seldon · 2024-07-15T15:41:15Z

Would love to see this!

rjmehta1993 · 2024-08-05T15:55:19Z

+1. EXl quants is unbeatable

fablerq · 2024-09-02T01:26:07Z

Is there any chance to see exl2? 👀

DaBossCoda · 2024-11-25T05:04:13Z

can we get this? no one making awq and gtpq quants anymore :(

javAlborz · 2024-12-09T09:11:32Z

exl2 would be nice 😃

alkeryn · 2024-12-09T10:16:26Z

@javAlborz it would !

SlapDrone · 2024-12-11T16:31:11Z

+1!

alkeryn · 2024-12-13T13:40:38Z

vllm's cli is my favorite so far because it just works, also the api is better than tabby.
but god, exl2 is better than awq.

Originalimoc · 2024-12-19T06:31:39Z

+1. Most new models are in GGUF and EXL2.

drexample · 2024-12-19T18:05:52Z

Give exl2 support pls

rsxdalv · 2024-12-19T23:12:38Z

Although I am waiting for exl2 support myself, the amount of +1 messages really don't help.
If you truly wish to make a +1, state reasons so that the developers or forkers have a real incentive.
Yes, for inactive issues a +1 might amplify your voice and is better than nothing, but a stream of +1s really can be worse than nothing. The probability that devs have unsubscribed from this issue is fairly high.

JohnConnor123 · 2024-12-26T12:10:42Z

we are waiting for exl2 support!

hmellor added the feature request label Sep 20, 2024

gpgn mentioned this issue Nov 27, 2024

[Roadmap] vLLM Roadmap Q4 2024 #9006

Open

40 tasks

AlpinDale linked a pull request Dec 20, 2024 that will close this issue

[Kernel] Add ExLlamaV2 Weight Quantization Support #11348

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ExLlamaV2: exl2 support #3203

ExLlamaV2: exl2 support #3203

pabl-o-ce commented Mar 5, 2024

mgoin commented Mar 5, 2024

pabl-o-ce commented Mar 6, 2024

tobiajung commented Mar 12, 2024

wxupjack commented Mar 14, 2024

sapountzis commented Apr 2, 2024

nkeilar commented Apr 5, 2024

zminer123 commented Apr 27, 2024

saucebing commented Apr 28, 2024

houmie commented Apr 28, 2024

belladoreai commented May 13, 2024

mku-wedoai commented May 20, 2024

DenisSergeevitch commented May 22, 2024

sparsh35 commented May 23, 2024

chopin1998 commented Jun 3, 2024

meditans commented Jun 9, 2024

kulievvitaly commented Jun 11, 2024

Respaired commented Jul 13, 2024

paulb-seldon commented Jul 15, 2024

rjmehta1993 commented Aug 5, 2024

fablerq commented Sep 2, 2024

DaBossCoda commented Nov 25, 2024

javAlborz commented Dec 9, 2024

alkeryn commented Dec 9, 2024

SlapDrone commented Dec 11, 2024

alkeryn commented Dec 13, 2024

Originalimoc commented Dec 19, 2024

drexample commented Dec 19, 2024

rsxdalv commented Dec 19, 2024

JohnConnor123 commented Dec 26, 2024

ExLlamaV2: exl2 support #3203

ExLlamaV2: exl2 support #3203

Comments

pabl-o-ce commented Mar 5, 2024

mgoin commented Mar 5, 2024

pabl-o-ce commented Mar 6, 2024

tobiajung commented Mar 12, 2024

wxupjack commented Mar 14, 2024

sapountzis commented Apr 2, 2024

nkeilar commented Apr 5, 2024

zminer123 commented Apr 27, 2024

saucebing commented Apr 28, 2024

houmie commented Apr 28, 2024

belladoreai commented May 13, 2024

mku-wedoai commented May 20, 2024

DenisSergeevitch commented May 22, 2024

sparsh35 commented May 23, 2024

chopin1998 commented Jun 3, 2024

meditans commented Jun 9, 2024

kulievvitaly commented Jun 11, 2024

Respaired commented Jul 13, 2024

paulb-seldon commented Jul 15, 2024

rjmehta1993 commented Aug 5, 2024

fablerq commented Sep 2, 2024

DaBossCoda commented Nov 25, 2024

javAlborz commented Dec 9, 2024

alkeryn commented Dec 9, 2024

SlapDrone commented Dec 11, 2024

alkeryn commented Dec 13, 2024

Originalimoc commented Dec 19, 2024

drexample commented Dec 19, 2024

rsxdalv commented Dec 19, 2024

JohnConnor123 commented Dec 26, 2024