Releases: casper-hansen/AutoAWQ
Releases · casper-hansen/AutoAWQ
v0.2.7.post2
What's Changed
- Post release 2 - All additional packages goes into extras by @casper-hansen in #653
Full Changelog: v0.2.7.post1...v0.2.7.post2
v0.2.7.post1
What's Changed
- Minimum of torch 2.2.0 during build by @casper-hansen in #649
- post release 1 by @casper-hansen in #650
Full Changelog: v0.2.7...v0.2.7.post1
v0.2.7
What's Changed
- fix: pass rope_theta argument when initializing LlamaLikeBlock for models like qwen2, mistral, etc. by @Shuai-Xie in #568
- Add Gemma2 support. by @radi-cho in #562
- ignore onnx in ignore_patterns by @casper-hansen in #570
- Add Internlm2 support by @Crystalcareai in #576
- quantization fails with old
datasets
by @stas00 in #593 - doc: replace a broken example with a working one by @stas00 in #595
- Implement NO_KERNELS flag and update torch requirement by @devin-ai-integration in #582
- AWQ Triton kernels. Make
autoawq-kernels
optional. by @casper-hansen in #608 - device_map defaults to auto by @casper-hansen in #607
- Let installed PyTorch decide required version number by @wasertech in #573
- Replace itrex qbits to ipex woq linear by @jiqing-feng in #549
- enable awq ipex linear in transformers by @jiqing-feng in #610
- fix for "two devices" issue due to RoPE changes by @davedgd in #630
- add qwen2vl support by @kq-chen in #599
- Add support for Phi-3-vision series model by @Isotr0py in #596
- support minicpm3.0 by @LDLINGLINGLING in #605
- Enable Intel GPU path and lora finetune and change examples to support different devices by @jiqing-feng in #631
- Replace custom sharding with save_torch_state_dict from huggingface_hub by @casper-hansen in #644
- New release (0.2.7) + Fix build by @casper-hansen in #647
- Only build once by @casper-hansen in #648
New Contributors
- @Shuai-Xie made their first contribution in #568
- @radi-cho made their first contribution in #562
- @Crystalcareai made their first contribution in #576
- @stas00 made their first contribution in #593
- @wasertech made their first contribution in #573
- @jiqing-feng made their first contribution in #549
- @davedgd made their first contribution in #630
- @kq-chen made their first contribution in #599
Full Changelog: v0.2.6...v0.2.7
v0.2.6
What's Changed
- Cohere Support by @TechxGenus in #457
- Add phi3 support by @pprp in #481
- Support Weight-Only quantization on CPU device with QBits backend by @PenghuiCheng in #437
- Fix typo by @wanyaworld in #486
- Add updates + sponsorship by @casper-hansen in #495
- Update README.md by @casper-hansen in #497
- Update doc by @imba-tjd in #499
- add support for Openbmb/MiniCPM by @LDLINGLINGLING in #504
- Update RunPod support by @casper-hansen in #514
- add deepseek v2 support by @TechxGenus in #508
- nan problem of Qwen2-72B quantization by @baoyf4244 in #519
- Qwen nan fix by @baoyf4244 in #522
- fix deepseek v2 input feat by @TechxGenus in #524
- Batched quantization by @casper-hansen in #516
- Fix step size when computing clipping by @casper-hansen in #531
- Pin torch version to 2.3.1 by @devin-ai-integration in #542
- Revert "Pin torch version to 2.3.1 (#542)" by @casper-hansen in #547
- CLI example + Runpod launch script by @casper-hansen in #548
- Print warning if AutoAWQ cannot load extensions by @casper-hansen in #515
- Remove progress bars by @casper-hansen in #550
- Add test for chunked methods by @casper-hansen in #551
- Llama with inputs_embeds only(LLava-v1.5 bug fixed) and Llava-v1.6 Support by @WanBenLe in #471
- Better CLI + RunPod Script by @casper-hansen in #552
- Release 026 by @casper-hansen in #546
- pin torch==2.3.1 by @casper-hansen in #554
- Remove ROCm build and only build for PyPi by @casper-hansen in #555
New Contributors
- @pprp made their first contribution in #481
- @PenghuiCheng made their first contribution in #437
- @wanyaworld made their first contribution in #486
- @imba-tjd made their first contribution in #499
- @LDLINGLINGLING made their first contribution in #504
- @baoyf4244 made their first contribution in #519
- @devin-ai-integration made their first contribution in #542
- @WanBenLe made their first contribution in #471
Full Changelog: v0.2.5...v0.2.6
v0.2.5
What's Changed
- Fix fused models for tf >= 4.39 by @TechxGenus in #418
- FIX: Add safe guards for static cache + llama on transformers latest by @younesbelkada in #401
- Pin: lm_eval==0.4.1 by @casper-hansen in #426
- Implement
apply_clip
argument toquantize()
by @casper-hansen in #427 - Workaround: illegal memory access by @casper-hansen in #421
- Add download_kwargs for load model (#302) by @Roshiago in #399
- add starcoder2 support by @shaonianyr in #406
- Add StableLM support by @Isotr0py in #410
- Fix starcoder2 fused norm by @TechxGenus in #442
- Update generate example to llama 3 by @casper-hansen in #448
- [BUG] Fix github action documentation build by @suparious in #449
- Fix path by @casper-hansen in #451
- FIX: 'awq_ext' is not defined error by @younesbelkada in #465
- FIX: Fix multiple generations for new HF cache format by @younesbelkada in #444
- support max_memory to specify mem usage for each GPU by @laoda513 in #460
- Bump to 0.2.5 by @casper-hansen in #468
New Contributors
- @Roshiago made their first contribution in #399
- @shaonianyr made their first contribution in #406
- @Isotr0py made their first contribution in #410
- @suparious made their first contribution in #449
- @laoda513 made their first contribution in #460
Full Changelog: v0.2.4...v0.2.5
v0.2.4
What's Changed
- Add Gemma Support by @TechxGenus in #393
- Pin transformers>=4.35.0,<=4.38.2 by @casper-hansen in #408
- Bump to v0.2.4 by @casper-hansen in #409
New Contributors
- @TechxGenus made their first contribution in #393
Full Changelog: v0.2.3...v0.2.4
v0.2.3
What's Changed
- New optimized kernels by @casper-hansen in #365
- Fix double bias by @casper-hansen in #383
- x_max -> x_mean and w_max -> w_mean name changes and some comments by @OscarSavolainenDR in #378
New Contributors
- @OscarSavolainenDR made their first contribution in #378
Full Changelog: v0.2.2...v0.2.3
v0.2.2
What's Changed
- Support Fused Mixtral on multi-GPU by @casper-hansen in #352
- Add multi-GPU benchmark of Mixtral by @casper-hansen in #353
- Remove MoE Triton kernels by @casper-hansen in #355
- Bump to 0.2.2 by @casper-hansen in #356
Full Changelog: v0.2.1...v0.2.2
v0.2.1
What's Changed
- Avoid downloading ROCm by @casper-hansen in #347
- ENH / FIX: Few enhancements and fix for mixed-precision training by @younesbelkada in #348
- Fix triton dependency by @casper-hansen in #350
- Bump to 0.2.1 by @casper-hansen in #351
Full Changelog: v0.2.0...v0.2.1
v0.2.0
What's Changed
- AWQ: Separate the AWQ kernels to separate repository by @casper-hansen in #279
- Add CPU-loaded multi-GPU quantization by @xNul in #289
- GGUF compatible quantization (2, 3, 4 bit / any bit) by @casper-hansen in #285
- Exllama kernels support by @IlyasMoutawwakil in #313
- Cleanup requirements by @casper-hansen in #295
- Torch only inference + any-device quantization by @casper-hansen in #319
- Up to 60% faster context processing by @casper-hansen in #316
- Evaluation: Add more evals by @casper-hansen in #283
- Fixes a breaking change in autoawq by @younesbelkada in #325
- AMD ROCM Support by @IlyasMoutawwakil in #315
- Marlin symmetric quantization and inference by @IlyasMoutawwakil in #320
- Add qwen2 by @JustinLin610 in #321
- Fix n_samples by @casper-hansen in #326
- PEFT compatible GEMM by @casper-hansen in #324
- [
PEFT
] Fix PEFT batch size > 1 by @younesbelkada in #338 - v0.2.0 by @casper-hansen in #330
- Fix ROCm build by @casper-hansen in #342
- Fix dependency by @casper-hansen in #343
- Fix importlib by @casper-hansen in #344
- Fix workflow by @casper-hansen in #345
- Fix typo in setup.py by @casper-hansen in #346
New Contributors
- @xNul made their first contribution in #289
- @IlyasMoutawwakil made their first contribution in #313
- @JustinLin610 made their first contribution in #321
Full Changelog: v0.1.8...v0.2.0