Highlights
[Experimental Feature] We provide API support for VLM models
[Kernel] We add ipex support for intel cpu
[Bug fix] We fix tuning bug for glm4 model
[Enhancement] better align gradient_accumulate_steps
behavior for varied length input
What's Changed
- refine AuoRound format and support marlin repacking by @wenhuach21 in #280
- update readme for v0.3.1 release by @wenhuach21 in #283
- update readme for cpu inference by @wenhuach21 in #284
- avoid deterministic algorithm warning in inference by @wenhuach21 in #285
- fix mx_fp issues by @wenhuach21 in #286
- update torch ao integration information by @wenhuach21 in #287
- Refine code by @wenhuach21 in #291
- Add ipex support for intel cpu by @wenhuach21 in #292
- fix ipex tqdm mismatch issue by @wenhuach21 in #293
- fix bug of backend by @wenhuach21 in #294
- [Experimental Feature]support for common hf multimodel by @n1ck-guo in #276
- use torch.compile by default for PyTorch versions 2.6 and above by @wenhuach21 in #295
- refine forward hook by @WeiweiZhang1 in #290
- eval for MLLMs by @n1ck-guo in #296
- mllm eval bug fix by @n1ck-guo in #297
- Port Numba-based packing from INC by @yiliu30 in #301
- refine model config file for mixed precision quantization by @wenhuach21 in #300
- fix glm4-9b batch dim issue by @wenhuach21 in #304
- better align gradient_accumulate_steps for varied length input by @wenhuach21 in #309
- Enable torch.compile on HPU by @yiliu30 in #307
- Update autogptq exporting by @wenhuach21 in #310
- fix typo by @wenhuach21 in #311
- qwen2 vision quantization bugfix by @WeiweiZhang1 in #313
- multiple gpu evaluation/calibration refine by @wenhuach21 in #312
- HPU only release binary by @yiliu30 in #302
- patch 1 for mllm by @n1ck-guo in #298
- add torch compile arg by @wenhuach21 in #314
- fix merge error by @n1ck-guo in #316
- Update the check for HPU by @yiliu30 in #318
- fix eval device issue by @wenhuach21 in #319
- fix multiple device bug by @wenhuach21 in #321
- add warning for no gptq exllamav2 kernel by @wenhuach21 in #324
- add pile calib, rename quant_block_list to to_quant_block_names by @WeiweiZhang1 in #322
- fix autogptq version error by @wenhuach21 in #325
- new mllm eval by @n1ck-guo in #317
- Add cpu only version by @XuehaoSun in #315
- set default mllm dataset by @n1ck-guo in #327
- fix fp_layers issue and force to FP16 on cuda for autoround format inference by @wenhuach21 in #326
- fix the bug of test model support for test-only by @n1ck-guo in #328
- Increase unit test timeout to 120 minutes by @XuehaoSun in #330
- fix mllm dataset config bug and add gptq cuda backend by @wenhuach21 in #329
- add tips and tricks for llm&mllm quantization by @wenhuach21 in #333
- fix eval_bs in fake format and reset auto-gptq exporting max_shard_size by @wenhuach21 in #332
- fix model_dtype issue and reformat mllm code by @wenhuach21 in #335
- Exclude markdown files from unit test pipelines by @XuehaoSun in #337
- refine mllm docs by @WeiweiZhang1 in #336
- cogvlm doc by @n1ck-guo in #339
- add qwen2.5 recipe and refine readme by @WeiweiZhang1 in #338
- add cogvlm recipe and refine readme by @WeiweiZhang1 in #340
- refine mllm API and add help info by @n1ck-guo in #334
Full Changelog: v0.3.1...v0.4