Add Huggingface model zoo from community #1674
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
According to [Feature Request: "Model Zoo" for quantization #1591],
this is our initial effort to create the Model Zoo.
The first model uploaded is Llama3-70b, AWQ Quantized.
I have identified several opportunities within the Model Zoo. I encountered a variety of configurations including PP_size, TP_size, KV_cache_type (fp16, fp8, int8), Group_size (64, 128), and Quantization algorithms (AWQ, SQ, FP8).
I will try to figure out the "proper" base configuration.
I have decided to use the lowest possible Group_size (to prevent the degradation of quantization) and set PP_size to 1.
Let's discuss if we can determine the "proper" configurations.