Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add mmlu-redux and it's spanish transaltion as generative task definitions #2705

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

luiscosio
Copy link

This PR adds generative task definitions for two MMLU-Redux datasets:

The task definitions follow the same structure and evaluation metrics as existing MMLU tasks, using exact_match for scoring with weight_by_size enabled. Both datasets are organized into 4 main groups:

  • STEM
  • Other
  • Social Sciences
  • Humanities

Each group maintains consistent evaluation metrics and aggregation methods across both language versions.

Changes include:

  • Added task definitions for generative format evaluation
  • Consistent group structure between English and Spanish versions
  • Maintained weight_by_size true for all metrics
  • Version 3 metadata tag for compatibility

This enhancement allows for direct comparison of model performance between English and Spanish versions of MMLU-Redux in a generative setting.

@baberabb
Copy link
Contributor

Hi! Thanks for the PR. Just some minor issues:

  1. test is failing as it can't find one of the subtask configs on HF hub (probably a typo).
  2. Could you add the readme from template/new_yaml_task, and also add an entry in lm_eval/tasks/README.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants