Skip to content

v0.1.7

Compare
Choose a tag to compare
@dongjiyingdjy dongjiyingdjy released this 19 Mar 02:53
· 1202 commits to main since this release

features

  • support int4 (experimental) on Qwen GPTQ
  • support V100 fmha
  • support Bert
  • Optimize VIT Engine by TensorRT

refactor

  • refactor schedule strategy, malloc kv cache in schedule new stream
  • refactor MOE

docs

  • update supported models