Skip to content

Releases: open-compass/opencompass

OpenCompass v0.1.2

11 Aug 10:45
4fc1701
Compare
Choose a tag to compare

This release continues the evolution of OpenCompass, bringing a mix of new features, optimizations, documentation improvements, and bug fixes.

🆕Highlights

🏆 Leaderboard: The evaluation results of Qwen-7B, XVERSE-13B, LLaMA-2, and GPT-4 has been posted to our leaderboard. Now it's also possible to conduct model comparison online. We hope this feature offers deeper insights!

📊 Datasets: Introduction of Xiezhi, SQuAD2.0, ANLI, LEval datasets, and more for diverse applications. (#101, #192) Add datasets related to safety to collections. [#185]

🎭New modality: Support for MMBench is introduced, and the evaluation of multi-modal models is on the way! (#56 ,#161) Besides, Intern language model is introduced. (#51)

⚙️Enhancement: Several enhancements on OpenAI models, including key deprecation, temperature setting, etc. [#121] [#128] Supporting multiple tasks on one GPU, filtering messages by levels, and more. [#148] [#187]

📝 Documentation: Comprehensive updates and fixes across READMEs, issue templates, prompt docs, metric documentation, and more.

🛠️ Bug Fixes: Including seed fixes in HFEvaluator, addressing issues in AGIEval multiple choice questions, and more. [#122] [#137]

🎉 New Contributors

Thank you to all our contributors for this release, with a special shoutout to our new contributors:

@go-with-me000 (First Contribution)
@anakin-skywalker-Joseph (First Contribution)
@zhouzaida (First Contribution)
@dependabot (First Contribution)

Changelog

Full Changelog: 0.1.1...0.1.2

v0.1.1

26 Jul 07:11
b7184e9
Compare
Choose a tag to compare

Add some more datasets.

  • AGIEval
  • anli
  • cmmlu
  • jigsawmultilingual
  • realtoxicprompts
  • SQuAD2.0
  • TheoremQA
  • triviaqa
  • xiezhi
  • Xsum

v0.1.0

06 Jul 07:23
Compare
Choose a tag to compare

First release with some datasets.

  • ARC
  • BBH
  • ceval
  • CLUE
  • FewCLUE
  • GAOKAO-BENCH
  • LCSTS
  • math
  • mbpp
  • mmlu
  • nq
  • summedits
  • SuperGLUE