Releases · casper-hansen/AutoAWQ

fix: pass rope_theta argument when initializing LlamaLikeBlock for models like qwen2, mistral, etc. by @Shuai-Xie in #568
Add Gemma2 support. by @radi-cho in #562
ignore onnx in ignore_patterns by @casper-hansen in #570
Add Internlm2 support by @Crystalcareai in #576
quantization fails with old datasets by @stas00 in #593
doc: replace a broken example with a working one by @stas00 in #595
Implement NO_KERNELS flag and update torch requirement by @devin-ai-integration in #582
AWQ Triton kernels. Make autoawq-kernels optional. by @casper-hansen in #608
device_map defaults to auto by @casper-hansen in #607
Let installed PyTorch decide required version number by @wasertech in #573
Replace itrex qbits to ipex woq linear by @jiqing-feng in #549
enable awq ipex linear in transformers by @jiqing-feng in #610
fix for "two devices" issue due to RoPE changes by @davedgd in #630
add qwen2vl support by @kq-chen in #599
Add support for Phi-3-vision series model by @Isotr0py in #596
support minicpm3.0 by @LDLINGLINGLING in #605
Enable Intel GPU path and lora finetune and change examples to support different devices by @jiqing-feng in #631
Replace custom sharding with save_torch_state_dict from huggingface_hub by @casper-hansen in #644
New release (0.2.7) + Fix build by @casper-hansen in #647
Only build once by @casper-hansen in #648

New Contributors

@Shuai-Xie made their first contribution in #568
@radi-cho made their first contribution in #562
@Crystalcareai made their first contribution in #576
@stas00 made their first contribution in #593
@wasertech made their first contribution in #573
@jiqing-feng made their first contribution in #549
@davedgd made their first contribution in #630
@kq-chen made their first contribution in #599

Full Changelog: v0.2.6...v0.2.7

Contributors

davedgd, stas00, and 9 other contributors

Assets 3

23 Jul 17:17

github-actions

v0.2.6

e683bfd

v0.2.6

What's Changed

Cohere Support by @TechxGenus in #457
Add phi3 support by @pprp in #481
Support Weight-Only quantization on CPU device with QBits backend by @PenghuiCheng in #437
Fix typo by @wanyaworld in #486
Add updates + sponsorship by @casper-hansen in #495
Update README.md by @casper-hansen in #497
Update doc by @imba-tjd in #499
add support for Openbmb/MiniCPM by @LDLINGLINGLING in #504
Update RunPod support by @casper-hansen in #514
add deepseek v2 support by @TechxGenus in #508
nan problem of Qwen2-72B quantization by @baoyf4244 in #519
Qwen nan fix by @baoyf4244 in #522
fix deepseek v2 input feat by @TechxGenus in #524
Batched quantization by @casper-hansen in #516
Fix step size when computing clipping by @casper-hansen in #531
Pin torch version to 2.3.1 by @devin-ai-integration in #542
Revert "Pin torch version to 2.3.1 (#542)" by @casper-hansen in #547
CLI example + Runpod launch script by @casper-hansen in #548
Print warning if AutoAWQ cannot load extensions by @casper-hansen in #515
Remove progress bars by @casper-hansen in #550
Add test for chunked methods by @casper-hansen in #551
Llama with inputs_embeds only(LLava-v1.5 bug fixed) and Llava-v1.6 Support by @WanBenLe in #471
Better CLI + RunPod Script by @casper-hansen in #552
Release 026 by @casper-hansen in #546
pin torch==2.3.1 by @casper-hansen in #554
Remove ROCm build and only build for PyPi by @casper-hansen in #555

New Contributors

@pprp made their first contribution in #481
@PenghuiCheng made their first contribution in #437
@wanyaworld made their first contribution in #486
@imba-tjd made their first contribution in #499
@LDLINGLINGLING made their first contribution in #504
@baoyf4244 made their first contribution in #519
@devin-ai-integration made their first contribution in #542
@WanBenLe made their first contribution in #471

Full Changelog: v0.2.5...v0.2.6

Contributors

wanyaworld, baoyf4244, and 7 other contributors

Assets 10

02 May 18:23

github-actions

v0.2.5

5f3785d

v0.2.5

What's Changed

Fix fused models for tf >= 4.39 by @TechxGenus in #418
FIX: Add safe guards for static cache + llama on transformers latest by @younesbelkada in #401
Pin: lm_eval==0.4.1 by @casper-hansen in #426
Implement apply_clip argument to quantize() by @casper-hansen in #427
Workaround: illegal memory access by @casper-hansen in #421
Add download_kwargs for load model (#302) by @Roshiago in #399
add starcoder2 support by @shaonianyr in #406
Add StableLM support by @Isotr0py in #410
Fix starcoder2 fused norm by @TechxGenus in #442
Update generate example to llama 3 by @casper-hansen in #448
[BUG] Fix github action documentation build by @suparious in #449
Fix path by @casper-hansen in #451
FIX: 'awq_ext' is not defined error by @younesbelkada in #465
FIX: Fix multiple generations for new HF cache format by @younesbelkada in #444
support max_memory to specify mem usage for each GPU by @laoda513 in #460
Bump to 0.2.5 by @casper-hansen in #468

New Contributors

@Roshiago made their first contribution in #399
@shaonianyr made their first contribution in #406
@Isotr0py made their first contribution in #410
@suparious made their first contribution in #449
@laoda513 made their first contribution in #460

Full Changelog: v0.2.4...v0.2.5

Contributors

suparious, Roshiago, and 6 other contributors

Assets 26

24 Mar 11:28

github-actions

v0.2.4

0fa9a2c

v0.2.4

What's Changed

Add Gemma Support by @TechxGenus in #393
Pin transformers>=4.35.0,<=4.38.2 by @casper-hansen in #408
Bump to v0.2.4 by @casper-hansen in #409

New Contributors

@TechxGenus made their first contribution in #393

Full Changelog: v0.2.3...v0.2.4

Contributors

casper-hansen and TechxGenus

Assets 26

02 Mar 10:13

github-actions

v0.2.3

d8ca1e2

v0.2.3

What's Changed

New optimized kernels by @casper-hansen in #365
Fix double bias by @casper-hansen in #383
x_max -> x_mean and w_max -> w_mean name changes and some comments by @OscarSavolainenDR in #378

New Contributors

@OscarSavolainenDR made their first contribution in #378

Full Changelog: v0.2.2...v0.2.3

Contributors

casper-hansen and OscarSavolainenDR

Assets 26

17 Feb 10:38

github-actions

v0.2.2

6b7992a

v0.2.2

What's Changed

Support Fused Mixtral on multi-GPU by @casper-hansen in #352
Add multi-GPU benchmark of Mixtral by @casper-hansen in #353
Remove MoE Triton kernels by @casper-hansen in #355
Bump to 0.2.2 by @casper-hansen in #356

Full Changelog: v0.2.1...v0.2.2

Contributors

casper-hansen

Assets 26

16 Feb 08:59

github-actions

v0.2.1

7405310

v0.2.1

What's Changed

Avoid downloading ROCm by @casper-hansen in #347
ENH / FIX: Few enhancements and fix for mixed-precision training by @younesbelkada in #348
Fix triton dependency by @casper-hansen in #350
Bump to 0.2.1 by @casper-hansen in #351

Full Changelog: v0.2.0...v0.2.1

Contributors

casper-hansen and younesbelkada

Assets 26

15 Feb 20:57

github-actions

v0.2.0

d831102

v0.2.0

What's Changed

AWQ: Separate the AWQ kernels to separate repository by @casper-hansen in #279
Add CPU-loaded multi-GPU quantization by @xNul in #289
GGUF compatible quantization (2, 3, 4 bit / any bit) by @casper-hansen in #285
Exllama kernels support by @IlyasMoutawwakil in #313
Cleanup requirements by @casper-hansen in #295
Torch only inference + any-device quantization by @casper-hansen in #319
Up to 60% faster context processing by @casper-hansen in #316
Evaluation: Add more evals by @casper-hansen in #283
Fixes a breaking change in autoawq by @younesbelkada in #325
AMD ROCM Support by @IlyasMoutawwakil in #315
Marlin symmetric quantization and inference by @IlyasMoutawwakil in #320
Add qwen2 by @JustinLin610 in #321
Fix n_samples by @casper-hansen in #326
PEFT compatible GEMM by @casper-hansen in #324
[PEFT] Fix PEFT batch size > 1 by @younesbelkada in #338
v0.2.0 by @casper-hansen in #330
Fix ROCm build by @casper-hansen in #342
Fix dependency by @casper-hansen in #343
Fix importlib by @casper-hansen in #344
Fix workflow by @casper-hansen in #345
Fix typo in setup.py by @casper-hansen in #346

New Contributors

@xNul made their first contribution in #289
@IlyasMoutawwakil made their first contribution in #313
@JustinLin610 made their first contribution in #321

Full Changelog: v0.1.8...v0.2.0

Contributors

xNul, casper-hansen, and 3 other contributors

Assets 26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

Releases: casper-hansen/AutoAWQ

v0.2.7.post2

What's Changed

Contributors

v0.2.7.post1

What's Changed

Contributors

v0.2.7

What's Changed

New Contributors

Contributors

v0.2.6

What's Changed

New Contributors

Contributors

v0.2.5

What's Changed

New Contributors

Contributors

v0.2.4

What's Changed

New Contributors

Contributors

v0.2.3

What's Changed

New Contributors

Contributors

v0.2.2

What's Changed

Contributors

v0.2.1

What's Changed

Contributors

v0.2.0

What's Changed

New Contributors

Contributors