llama-7b模型在mmlu上直接评估的效果不佳 #2961

12aka12 · 2024-03-25T09:31:14Z

Reminder

I have read the README and searched the existing issues.

Reproduction

python src/evaluate.py \
    --model_name_or_path llama-7b-hf \
    --template vanilla \
    --task mmlu \
    --split test \
    --lang en \
    --n_shot 5 \
    --batch_size 2

Expected behavior

你好，我使用llama-7b的模型在mmlu上直接评估，请问为什么我的效果会这么差呀

System Info

Others

No response

The text was updated successfully, but these errors were encountered:

hiyouga · 2024-03-25T12:51:13Z

能复现这个问题，但是 llama2 似乎正常，问题有待排查

12aka12 · 2024-03-26T01:39:02Z

python src/evaluate.py    \
 --model_name_or_path llama2-7b \
 --template vanilla \
 --task mmlu \
 --split test \
 --lang en \
 --n_shot 5 \
 --batch_size 2

你好，关于llama2直接测试的效果，我的结果是这样的，请问可能是什么原因呢

hiyouga · 2024-03-26T09:26:49Z

现在 LLaMA-7B 的比较正常了，可能和官方结果有两三个点的偏差

@marko1616

* fix packages * Update wechat.jpg * Updated README with new information * Updated README with new information * Updated README with new information * Follow HF_ENDPOINT environment variable * fix hiyouga#2346 * fix hiyouga#2777 hiyouga#2895 * add orca_dpo_pairs dataset * support fsdp + qlora * update readme * update tool extractor * paper release * add citation * move file * Update README.md, fix the release date of the paper * Update README_zh.md, fix the release date of the paper * Update wechat.jpg * fix hiyouga#2941 * fix hiyouga#2928 * fix hiyouga#2936 * fix Llama lora merge crash * fix Llama lora merge crash * fix Llama lora merge crash * pass ruff check * tiny fix * Update requirements.txt * Update README_zh.md * release v0.6.0 * add arg check * Update README_zh.md * Update README.md * update readme * tiny fix * release v0.6.0 (real) * Update wechat.jpg * fix hiyouga#2961 * fix bug * fix hiyouga#2981 * fix ds optimizer * update trainers * fix hiyouga#3010 * update readme * fix hiyouga#2982 * add project * update readme * release v0.6.1 * Update wechat.jpg * fix pile datset hf hub url * upgrade gradio to 4.21.0 * support save args in webui hiyouga#2807 hiyouga#3046 some ideas are borrowed from @marko1616 * Fix Llama model save for full param train * fix blank line contains whitespace * tiny fix * support ORPO * support orpo in webui * update readme * use log1p in orpo loss huggingface/trl#1491 * fix plots * fix IPO and ORPO loss * fix ORPO loss * update webui * support infer 4bit model on GPUs hiyouga#3023 * fix hiyouga#3077 * add qwen1.5 moe * fix hiyouga#3083 * set dev version * Update SECURITY.md * fix hiyouga#3022 * add moe aux loss control hiyouga#3085 * simplify readme * update readme * update readme * update examples * update examples * add zh readme * update examples * update readme * update vllm example * Update wechat.jpg * fix hiyouga#3116 * fix resize vocab at inference hiyouga#3022 * fix requires for windows * fix bug in latest gradio * back to gradio 4.21 and fix chat * tiny fix * update examples * update readme * support Qwen1.5-32B * support Qwen1.5-32B * fix spell error * support hiyouga#3152 * rename template to breeze * rename template to breeze * add empty line * Update wechat.jpg * tiny fix * fix quant infer and qwen2moe * Pass additional_target to unsloth Fixes hiyouga#3200 * Update adapter.py * Update adapter.py * fix hiyouga#3225 --------- Co-authored-by: hiyouga <[email protected]> Co-authored-by: 刘一博 <[email protected]> Co-authored-by: khazic <[email protected]> Co-authored-by: SirlyDreamer <[email protected]> Co-authored-by: Sanjay Nadhavajhala <[email protected]> Co-authored-by: sanjay920 <[email protected]> Co-authored-by: 0xez <[email protected]> Co-authored-by: marko1616 <[email protected]> Co-authored-by: Remek Kinas <[email protected]> Co-authored-by: Tsumugii24 <[email protected]> Co-authored-by: li.yunhao <[email protected]> Co-authored-by: sliderSun <[email protected]> Co-authored-by: codingma <[email protected]> Co-authored-by: Erich Schubert <[email protected]>

hiyouga added the pending This problem is yet to be addressed label Mar 25, 2024

hiyouga closed this as completed in 511f675 Mar 26, 2024

hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Mar 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama-7b模型在mmlu上直接评估的效果不佳 #2961

llama-7b模型在mmlu上直接评估的效果不佳 #2961

12aka12 commented Mar 25, 2024 •

edited

Loading

hiyouga commented Mar 25, 2024

12aka12 commented Mar 26, 2024

hiyouga commented Mar 26, 2024

llama-7b模型在mmlu上直接评估的效果不佳 #2961

llama-7b模型在mmlu上直接评估的效果不佳 #2961

Comments

12aka12 commented Mar 25, 2024 • edited Loading

Reminder

Reproduction

Expected behavior

System Info

Others

hiyouga commented Mar 25, 2024

12aka12 commented Mar 26, 2024

hiyouga commented Mar 26, 2024

12aka12 commented Mar 25, 2024 •

edited

Loading