Add Huggingface model zoo from community #1674

matichon-vultureprime · 2024-05-25T09:47:45Z

According to [Feature Request: "Model Zoo" for quantization #1591],
this is our initial effort to create the Model Zoo.
The first model uploaded is Llama3-70b, AWQ Quantized.

I have identified several opportunities within the Model Zoo. I encountered a variety of configurations including PP_size, TP_size, KV_cache_type (fp16, fp8, int8), Group_size (64, 128), and Quantization algorithms (AWQ, SQ, FP8).
I will try to figure out the "proper" base configuration.

I have decided to use the lowest possible Group_size (to prevent the degradation of quantization) and set PP_size to 1.

Let's discuss if we can determine the "proper" configurations.

byshiue · 2024-05-28T01:01:53Z

Thank you for the PR. We will merge it soon.

nv-guomingz · 2024-06-03T12:04:05Z

Hi @matichon-vultureprime ,thanks for your contributing. We've merged your contribution into code base and will add you into contributor list.

Add Huggingface model zoo from community

b8b6240

byshiue self-requested a review May 28, 2024 01:07

byshiue self-assigned this May 28, 2024

byshiue added triaged Issue has been triaged by maintainers Community want to contribute labels May 28, 2024

nv-guomingz closed this Jun 3, 2024

nv-guomingz added the Merged label Jun 3, 2024

kaiyux mentioned this pull request Jun 4, 2024

Update TensorRT-LLM #1725

Merged

kaiyux mentioned this pull request Jul 17, 2024

TensorRT-LLM v0.11 Update #1969

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Huggingface model zoo from community #1674

Add Huggingface model zoo from community #1674

matichon-vultureprime commented May 25, 2024 •

edited

Loading

byshiue commented May 28, 2024

nv-guomingz commented Jun 3, 2024

Add Huggingface model zoo from community #1674

Add Huggingface model zoo from community #1674

Conversation

matichon-vultureprime commented May 25, 2024 • edited Loading

byshiue commented May 28, 2024

nv-guomingz commented Jun 3, 2024

matichon-vultureprime commented May 25, 2024 •

edited

Loading