Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add global mmlu lite sensitivity cards #1568

Merged
merged 10 commits into from
Feb 2, 2025

Conversation

eliyahabba
Copy link
Collaborator

feat: add Global-MMLU-Lite CS/CA task cards

Add two task cards for Global-MMLU-Lite dataset:

  • CS card for culturally sensitive questions
  • CA card for culturally agnostic questions

Both cards include:

  • Support for 14 languages
  • Multiple choice QA format
  • Topic mapping and preprocessing steps

Add two task cards for Global-MMLU-Lite dataset:
- CS card for culturally sensitive questions
- CA card for culturally agnostic questions

Both cards include:
- Support for 14 languages
- Multiple choice QA format
- Topic mapping and preprocessing steps
Add two task cards for Global-MMLU-Lite dataset:
- CS card for culturally sensitive questions
- CA card for culturally agnostic questions

Both cards include:
- Support for 14 languages
- Multiple choice QA format
- Topic mapping and preprocessing steps
Copy link
Member

@elronbandel elronbandel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is all the difference between the 3 files of global mmlu is the filtering lambda?
can you just have them in one python file with loop over the lambdas:
for func in [None, "lambda x: x['cultural_sensitivity_label'] == 'CA'", ...
Also can you run make pre-commit before committing to fix the style of the code (once you run it once it will persist to affect your code before new commits)

@eliyahabba
Copy link
Collaborator Author

Not exactly. I combined the two files of cultural_sensitivity_label, but there are important differences between these files and the global_mmlu file:

  1. different datasets: Global-MMLU-Lite vs Global-MMLU
  2. The processing approach is different: the Global-MMLU-Lite files create one card per language covering all subjects, while the Global-MMLU file creates a separate card for each language-subject combination.

@elronbandel elronbandel merged commit f9f9c5d into main Feb 2, 2025
16 of 18 checks passed
@elronbandel elronbandel deleted the add-global-mmlu-lite-sensitivity-cards branch February 2, 2025 20:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants