-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP, Kernel] (2/N) Machete - Integrate into GPTQMarlinLinearMethod and CompressedTensorsWNA16 #5
base: main
Are you sure you want to change the base?
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge). To run full CI, you can do one of these:
🚀 |
8b21235
to
2868f5d
Compare
11a9dec
to
529c59e
Compare
e9c70f8
to
6955a93
Compare
529c59e
to
0bcd9c1
Compare
60021df
to
e92b26e
Compare
df80f72
to
57a8011
Compare
a280110
to
ad5771a
Compare
57a8011
to
d2431a7
Compare
1361be1
to
d5ee5b8
Compare
d5ee5b8
to
735259b
Compare
953973d
to
90f8bb6
Compare
move heuristic into C++ code fix unit tests + format update for 3.5.1 remove custom scheduler codespell cleanup comment cleanup diff review comments review comments review comment changes review comments fix codespell cleanup util logic make dim names for prepack layout more canoncial missed refactor wip interleaving + recasting tweak tolerances comments plus interleaving format codespell review comments end2end first pass seperate out kernels, format add machete as a gptq backend update to use ModelWeightParameter formatting update parameter.py refactor permute layout wip
5a38aac
to
7bc8316
Compare
7bc8316
to
096dd4a
Compare
ac45f19
to
a98f691
Compare
End2end integration into GPTQMarlinLinearMethod and CompressedTensorsWNA16.