Releases · open-compass/opencompass

This release continues the evolution of OpenCompass, bringing a mix of new features, optimizations, documentation improvements, and bug fixes.

🆕Highlights

🏆 Leaderboard: The evaluation results of Qwen-7B, XVERSE-13B, LLaMA-2, and GPT-4 has been posted to our leaderboard. Now it's also possible to conduct model comparison online. We hope this feature offers deeper insights!

📊 Datasets: Introduction of Xiezhi, SQuAD2.0, ANLI, LEval datasets, and more for diverse applications. (#101, #192) Add datasets related to safety to collections. [#185]

🎭New modality: Support for MMBench is introduced, and the evaluation of multi-modal models is on the way! (#56 ,#161) Besides, Intern language model is introduced. (#51)

⚙️Enhancement: Several enhancements on OpenAI models, including key deprecation, temperature setting, etc. [#121] [#128] Supporting multiple tasks on one GPU, filtering messages by levels, and more. [#148] [#187]

📝 Documentation: Comprehensive updates and fixes across READMEs, issue templates, prompt docs, metric documentation, and more.

🛠️ Bug Fixes: Including seed fixes in HFEvaluator, addressing issues in AGIEval multiple choice questions, and more. [#122] [#137]

🎉 New Contributors

Thank you to all our contributors for this release, with a special shoutout to our new contributors:

@go-with-me000 (First Contribution)
@anakin-skywalker-Joseph (First Contribution)
@zhouzaida (First Contribution)
@dependabot (First Contribution)

Changelog

[Feat] add auto assignee bot by @yingfhu in #105
[Doc] Update Readme and Fix failed links by @Ezra-Yu in #108
Doc: add twitter link by @vansin in #111
Support intern lanuage model by @go-with-me000 in #51
[Docs] Update issue templates for proper guidance to discussions by @gaotongxiao in #116
[Feature] Allow explicitly setting the temperature for API model by @kennymckormick in #121
[Fix] Fix seed in HFEvaluator by @kennymckormick in #122
[Feature] Update SC by @Leymore in #126
说明文档标题修改 by @anakin-skywalker-Joseph in #125
[Docs] Update prompt docs by @Leymore in #46
[Enhancement] Update README.md by @tonysy in #119
[DOC] Add metric doc by @Ezra-Yu in #118
[Feature] Evaluating acc based on minimum edit distance, update SIQA by @gaotongxiao in #130
[Feature] Several enhancements by @gaotongxiao in #142
[Doc] update acknowledgements by @Leymore in #147
Fix typo in readme by @zhouzaida in #152
[Feature]: Use multimodal by @YuanLiuuuuuu in #73
[Refine] Refine PR #122 by @kennymckormick in #123
[Enhancement] Optimize OpenAI models by @gaotongxiao in #128
Update pre-commit ignore-word list by @gaotongxiao in #162
[Script] Add scripts to evaluate MMBench by @kennymckormick in #161
[Doc] Update Readme by @tonysy in #165
[Feature]: Add mm suport for local runner by @YuanLiuuuuuu in #169
Calculate max_out_len without hard code for OpenAI model by @zhouzaida in #158
[API] Refine OpenAI by @kennymckormick in #175
[Fix] Use a copy of the config object in Task by @gaotongxiao in #174
Bump requests from 2.28.1 to 2.31.0 by @dependabot in #178
[Fix] Fix AGIEval multiple choice by @Leymore in #137
[Feature]: Refactor input and output by @YuanLiuuuuuu in #176
[Feature] Add Xiezhi SQuAD2.0 ANLI by @Leymore in #101
[Feature] Support turbomind by @tonysy in #166
[Enhancement] Add humaneval postprocessor for GPT models & eval config for GPT4, enhance the original humaneval postprocessor by @gaotongxiao in #129
[Fix] Fix some sc errors by @liushz in #177
Fix meta template & unit tests by @gaotongxiao in #170
[Feature] Support CUDA_VISIBLE_DEVICES and multiple tasks on one GPU by @mzr1996 in #148
[Docs] Enhance issue template by @gaotongxiao in #183
Skip invalid keys to avoid requesting API by @zhouzaida in #184
[Feature] update news by @tonysy in #186
[Feature] Support filtering specified levels message by @zhouzaida in #187
[Feat] add safety to collections by @yingfhu in #185
[Docs] Update contribution guide & toc, improve user experience by @gaotongxiao in #188
[Feature] add llama-oriented dataset configs by @Leymore in #82
[Feat] update postprocessor to get first option more accurately by @yingfhu in #193
[Feature] Add LEval datasets by @gaotongxiao in #192
Bump version to 0.1.2 by @gaotongxiao in #190
[Fix] fix bug for postprocessor by @yingfhu in #195
[Doc] update readme by @Leymore in #196