-
-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ExLlamaV2: exl2 support #3203
Comments
@pabl-o-ce What specifically would you like to see? vLLM already integrates kernels from exllamav2 - see things such as the GPTQ kernels vllm/csrc/quantization/gptq/qdq_4.cuh Line 2 in 05af6da
|
Hi @mgoin thanks for the response. So is possible to use |
Hi, Is exl2 proper supported? How to start the docker container correctly to operate on exl2 models? |
Hi @mgoin, I think this feature submitted by @chu-tianxiang in #2330 and #916 just utilize the So it still don't support exl2 properly. @tobiajung |
We are also interested in exl2 dynamic precision format. What steps would be needed to support it? |
Support for this would allow dbrx exl2 support for the dbrx model quantized weights. Allowing inference of the model on a dual 24 gpu system. |
Hello there! Has any more thought/attention been given to the idea of exl2 support? The newest derivatives of llama3 (such as dolphin 70b) utilize it and it seems no one else is quantizing it to AWQ or GPTQ. I love vLLM regardless! Thank you guys for all the work you put in. |
Hi, we are also interested in EXL2 format, which is quite flexible and fast. As for flexibility, you can use 3.2, 4.5, 8.5 bpw (bit per weight) to quantize a model. And the inference speed of EXL2 is much faster than GPTQ in 8-bit precision. |
Fully agree. Supporting exl2 would be perfect! |
I would also love to see exl2 support |
I would love to see EXL2 support in vLLM! |
exl2 is needed feature 100% |
I support it too |
any update? |
Also voicing my support! |
VLLM is great. Like to see exlv2 support too! |
yeah this would be great to have |
Would love to see this! |
+1. EXl quants is unbeatable |
Is there any chance to see exl2? 👀 |
can we get this? no one making awq and gtpq quants anymore :( |
exl2 would be nice 😃 |
@javAlborz it would ! |
+1! |
vllm's cli is my favorite so far because it just works, also the api is better than tabby. |
+1. Most new models are in GGUF and EXL2. |
Give exl2 support pls |
Although I am waiting for exl2 support myself, the amount of +1 messages really don't help. |
we are waiting for exl2 support! |
If is possible ExLlamaV2 is a very fast and good library to Run LLM
ExLlamaV2 Repo
The text was updated successfully, but these errors were encountered: