Release v0.4 · intel/auto-round

Highlights

[Experimental Feature] We provide API support for VLM models
[Kernel] We add ipex support for intel cpu
[Bug fix] We fix tuning bug for glm4 model
[Enhancement] better align gradient_accumulate_steps behavior for varied length input

What's Changed

refine AuoRound format and support marlin repacking by @wenhuach21 in #280
update readme for v0.3.1 release by @wenhuach21 in #283
update readme for cpu inference by @wenhuach21 in #284
avoid deterministic algorithm warning in inference by @wenhuach21 in #285
fix mx_fp issues by @wenhuach21 in #286
update torch ao integration information by @wenhuach21 in #287
Refine code by @wenhuach21 in #291
Add ipex support for intel cpu by @wenhuach21 in #292
fix ipex tqdm mismatch issue by @wenhuach21 in #293
fix bug of backend by @wenhuach21 in #294
[Experimental Feature]support for common hf multimodel by @n1ck-guo in #276
use torch.compile by default for PyTorch versions 2.6 and above by @wenhuach21 in #295
refine forward hook by @WeiweiZhang1 in #290
eval for MLLMs by @n1ck-guo in #296
mllm eval bug fix by @n1ck-guo in #297
Port Numba-based packing from INC by @yiliu30 in #301
refine model config file for mixed precision quantization by @wenhuach21 in #300
fix glm4-9b batch dim issue by @wenhuach21 in #304
better align gradient_accumulate_steps for varied length input by @wenhuach21 in #309
Enable torch.compile on HPU by @yiliu30 in #307
Update autogptq exporting by @wenhuach21 in #310
fix typo by @wenhuach21 in #311
qwen2 vision quantization bugfix by @WeiweiZhang1 in #313
multiple gpu evaluation/calibration refine by @wenhuach21 in #312
HPU only release binary by @yiliu30 in #302
patch 1 for mllm by @n1ck-guo in #298
add torch compile arg by @wenhuach21 in #314
fix merge error by @n1ck-guo in #316
Update the check for HPU by @yiliu30 in #318
fix eval device issue by @wenhuach21 in #319
fix multiple device bug by @wenhuach21 in #321
add warning for no gptq exllamav2 kernel by @wenhuach21 in #324
add pile calib, rename quant_block_list to to_quant_block_names by @WeiweiZhang1 in #322
fix autogptq version error by @wenhuach21 in #325
new mllm eval by @n1ck-guo in #317
Add cpu only version by @XuehaoSun in #315
set default mllm dataset by @n1ck-guo in #327
fix fp_layers issue and force to FP16 on cuda for autoround format inference by @wenhuach21 in #326
fix the bug of test model support for test-only by @n1ck-guo in #328
Increase unit test timeout to 120 minutes by @XuehaoSun in #330
fix mllm dataset config bug and add gptq cuda backend by @wenhuach21 in #329
add tips and tricks for llm&mllm quantization by @wenhuach21 in #333
fix eval_bs in fake format and reset auto-gptq exporting max_shard_size by @wenhuach21 in #332
fix model_dtype issue and reformat mllm code by @wenhuach21 in #335
Exclude markdown files from unit test pipelines by @XuehaoSun in #337
refine mllm docs by @WeiweiZhang1 in #336
cogvlm doc by @n1ck-guo in #339
add qwen2.5 recipe and refine readme by @WeiweiZhang1 in #338
add cogvlm recipe and refine readme by @WeiweiZhang1 in #340
refine mllm API and add help info by @n1ck-guo in #334

Full Changelog: v0.3.1...v0.4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.4

Highlights

What's Changed

Contributors