Releases: open-compass/opencompass
OpenCompass v0.1.2
This release continues the evolution of OpenCompass, bringing a mix of new features, optimizations, documentation improvements, and bug fixes.
🆕Highlights
🏆 Leaderboard: The evaluation results of Qwen-7B, XVERSE-13B, LLaMA-2, and GPT-4 has been posted to our leaderboard. Now it's also possible to conduct model comparison online. We hope this feature offers deeper insights!
📊 Datasets: Introduction of Xiezhi, SQuAD2.0, ANLI, LEval datasets, and more for diverse applications. (#101, #192) Add datasets related to safety to collections. [#185]
🎭New modality: Support for MMBench is introduced, and the evaluation of multi-modal models is on the way! (#56 ,#161) Besides, Intern language model is introduced. (#51)
⚙️Enhancement: Several enhancements on OpenAI models, including key deprecation, temperature setting, etc. [#121] [#128] Supporting multiple tasks on one GPU, filtering messages by levels, and more. [#148] [#187]
📝 Documentation: Comprehensive updates and fixes across READMEs, issue templates, prompt docs, metric documentation, and more.
🛠️ Bug Fixes: Including seed fixes in HFEvaluator, addressing issues in AGIEval multiple choice questions, and more. [#122] [#137]
🎉 New Contributors
Thank you to all our contributors for this release, with a special shoutout to our new contributors:
@go-with-me000 (First Contribution)
@anakin-skywalker-Joseph (First Contribution)
@zhouzaida (First Contribution)
@dependabot (First Contribution)
Changelog
- [Feat] add auto assignee bot by @yingfhu in #105
- [Doc] Update Readme and Fix failed links by @Ezra-Yu in #108
- Doc: add twitter link by @vansin in #111
- Support intern lanuage model by @go-with-me000 in #51
- [Docs] Update issue templates for proper guidance to discussions by @gaotongxiao in #116
- [Feature] Allow explicitly setting the temperature for API model by @kennymckormick in #121
- [Fix] Fix seed in HFEvaluator by @kennymckormick in #122
- [Feature] Update SC by @Leymore in #126
- 说明文档标题修改 by @anakin-skywalker-Joseph in #125
- [Docs] Update prompt docs by @Leymore in #46
- [Enhancement] Update README.md by @tonysy in #119
- [DOC] Add metric doc by @Ezra-Yu in #118
- [Feature] Evaluating acc based on minimum edit distance, update SIQA by @gaotongxiao in #130
- [Feature] Several enhancements by @gaotongxiao in #142
- [Doc] update acknowledgements by @Leymore in #147
- Fix typo in readme by @zhouzaida in #152
- [Feature]: Use multimodal by @YuanLiuuuuuu in #73
- [Refine] Refine PR #122 by @kennymckormick in #123
- [Enhancement] Optimize OpenAI models by @gaotongxiao in #128
- Update pre-commit ignore-word list by @gaotongxiao in #162
- [Script] Add scripts to evaluate MMBench by @kennymckormick in #161
- [Doc] Update Readme by @tonysy in #165
- [Feature]: Add mm suport for local runner by @YuanLiuuuuuu in #169
- Calculate max_out_len without hard code for OpenAI model by @zhouzaida in #158
- [API] Refine OpenAI by @kennymckormick in #175
- [Fix] Use a copy of the config object in Task by @gaotongxiao in #174
- Bump requests from 2.28.1 to 2.31.0 by @dependabot in #178
- [Fix] Fix AGIEval multiple choice by @Leymore in #137
- [Feature]: Refactor input and output by @YuanLiuuuuuu in #176
- [Feature] Add Xiezhi SQuAD2.0 ANLI by @Leymore in #101
- [Feature] Support turbomind by @tonysy in #166
- [Enhancement] Add humaneval postprocessor for GPT models & eval config for GPT4, enhance the original humaneval postprocessor by @gaotongxiao in #129
- [Fix] Fix some sc errors by @liushz in #177
- Fix meta template & unit tests by @gaotongxiao in #170
- [Feature] Support CUDA_VISIBLE_DEVICES and multiple tasks on one GPU by @mzr1996 in #148
- [Docs] Enhance issue template by @gaotongxiao in #183
- Skip invalid keys to avoid requesting API by @zhouzaida in #184
- [Feature] update news by @tonysy in #186
- [Feature] Support filtering specified levels message by @zhouzaida in #187
- [Feat] add safety to collections by @yingfhu in #185
- [Docs] Update contribution guide & toc, improve user experience by @gaotongxiao in #188
- [Feature] add llama-oriented dataset configs by @Leymore in #82
- [Feat] update postprocessor to get first option more accurately by @yingfhu in #193
- [Feature] Add LEval datasets by @gaotongxiao in #192
- Bump version to 0.1.2 by @gaotongxiao in #190
- [Fix] fix bug for postprocessor by @yingfhu in #195
- [Doc] update readme by @Leymore in #196
Full Changelog: 0.1.1...0.1.2