Replies: 1 comment
-
确实需要一个在web端进行eva加载数据集的下拉栏 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I want to make a eval of benchmarks
MMLU, C-Eval and CMMLU
as said inChangelog
:[23/09/23] We integrated MMLU, C-Eval and CMMLU benchmarks in this repo. See this example to evaluate your models.
Evaluation
CUDA_VISIBLE_DEVICES=0 python src/evaluate.py \ --model_name_or_path path_to_llama_model \ --adapter_name_or_path path_to_checkpoint \ --template vanilla \ --finetuning_type lora \ --task mmlu \ --split test \ --lang en \ --n_shot 5 \ --batch_size 4
But in Web UI, can we have UI option dropdown of
mmlu
,ceval
andcmmlu
?Not as important given all "official" benchmark but I also suggest add some, start by popular: HumanEval, HellaSwag, ARC, SuperGLUE, GSM8K, TruthfulQA, BIG-Bench, PIQA, Natural Questions, AGIEval, RealToxicity.
Beta Was this translation helpful? Give feedback.
All reactions