Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

webui训练GLM4-9B-chat模型时候报错,数据集少的时候没问题,一多就报错 #4928

Closed
1 task done
15670173761 opened this issue Jul 23, 2024 · 1 comment
Closed
1 task done
Labels
solved This problem has been already solved

Comments

@15670173761
Copy link

Reminder

  • I have read the README and searched the existing issues.

System Info

  • llamafactory version: 0.8.3.dev0
  • Platform: Windows-10-10.0.19045-SP0
  • Python version: 3.10.8
  • PyTorch version: 2.3.1+cu121 (GPU)
  • Transformers version: 4.42.3
  • Datasets version: 2.20.0
  • Accelerate version: 0.32.1
  • PEFT version: 0.11.1
  • TRL version: 0.9.6
  • GPU type: NVIDIA GeForce RTX 2080 Ti

Reproduction

Traceback (most recent call last):
File "C:\Users\Administrator\PycharmProjects\LLaMA-Factory\venv\lib\site-packages\multiprocess\pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "C:\Users\Administrator\PycharmProjects\LLaMA-Factory\venv\lib\site-packages\datasets\utils\py_utils.py", line 678, in _write_generator_to_queue
for i, result in enumerate(func(**kwargs)):
File "C:\Users\Administrator\PycharmProjects\LLaMA-Factory\venv\lib\site-packages\datasets\arrow_dataset.py", line 3552, in _map_single
batch = apply_function_on_filtered_inputs(
File "C:\Users\Administrator\PycharmProjects\LLaMA-Factory\venv\lib\site-packages\datasets\arrow_dataset.py", line 3421, in apply_function_on_filtered_inputs
processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
File "C:\Users\Administrator\PycharmProjects\LLaMA-Factory\src\llamafactory\data\aligner.py", line 123, in convert_sharegpt
if dataset_attr.system_tag and messages[0][dataset_attr.role_tag] == dataset_attr.system_tag:
IndexError: list index out of range
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "D:\AIchatgpt\venv\Scripts\llamafactory-cli.exe_main
.py", line 7, in
sys.exit(main())
File "C:\Users\Administrator\PycharmProjects\LLaMA-Factory\src\llamafactory\cli.py", line 111, in main
run_exp()
File "C:\Users\Administrator\PycharmProjects\LLaMA-Factory\src\llamafactory\train\tuner.py", line 50, in run_exp
run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
File "C:\Users\Administrator\PycharmProjects\LLaMA-Factory\src\llamafactory\train\sft\workflow.py", line 46, in run_sft
dataset = get_dataset(model_args, data_args, training_args, stage="sft", **tokenizer_module)
File "C:\Users\Administrator\PycharmProjects\LLaMA-Factory\src\llamafactory\data\loader.py", line 174, in get_dataset
all_datasets.append(load_single_dataset(dataset_attr, model_args, data_args, training_args))
File "C:\Users\Administrator\PycharmProjects\LLaMA-Factory\src\llamafactory\data\loader.py", line 140, in load_single_dataset
return align_dataset(dataset, dataset_attr, data_args, training_args)
File "C:\Users\Administrator\PycharmProjects\LLaMA-Factory\src\llamafactory\data\aligner.py", line 233, in align_dataset
return dataset.map(
File "C:\Users\Administrator\PycharmProjects\LLaMA-Factory\venv\lib\site-packages\datasets\arrow_dataset.py", line 602, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "C:\Users\Administrator\PycharmProjects\LLaMA-Factory\venv\lib\site-packages\datasets\arrow_dataset.py", line 567, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "C:\Users\Administrator\PycharmProjects\LLaMA-Factory\venv\lib\site-packages\datasets\arrow_dataset.py", line 3253, in map
for rank, done, content in iflatmap_unordered(
File "C:\Users\Administrator\PycharmProjects\LLaMA-Factory\venv\lib\site-packages\datasets\utils\py_utils.py", line 718, in iflatmap_unordered
[async_result.get(timeout=0.05) for async_result in async_results]
File "C:\Users\Administrator\PycharmProjects\LLaMA-Factory\venv\lib\site-packages\datasets\utils\py_utils.py", line 718, in
[async_result.get(timeout=0.05) for async_result in async_results]
File "C:\Users\Administrator\PycharmProjects\LLaMA-Factory\venv\lib\site-packages\multiprocess\pool.py", line 774, in get
raise self._value
IndexError: list index out of range

Expected behavior

No response

Others

No response

@github-actions github-actions bot added the pending This problem is yet to be addressed label Jul 23, 2024
@hiyouga hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Jul 24, 2024
@HelloWorld506
Copy link

我用的当前能下载的最新的llamafactory版本,遇到了同样的问题
[rank0]: Traceback (most recent call last):
[rank0]: File "/mnt/gemininjceph2/geminicephfs/wx-mm-spr-xxxx/zhaochang/MEIJU/Qwen2-VL/LLaMA-Factory/src/llamafactory/launcher.py", line 23, in
[rank0]: launch()
[rank0]: File "/mnt/gemininjceph2/geminicephfs/wx-mm-spr-xxxx/zhaochang/MEIJU/Qwen2-VL/LLaMA-Factory/src/llamafactory/launcher.py", line 19, in launch
[rank0]: run_exp()
[rank0]: File "/mnt/gemininjceph2/geminicephfs/wx-mm-spr-xxxx/zhaochang/MEIJU/Qwen2-VL/LLaMA-Factory/src/llamafactory/train/tuner.py", line 50, in run_exp
[rank0]: run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
[rank0]: File "/mnt/gemininjceph2/geminicephfs/wx-mm-spr-xxxx/zhaochang/MEIJU/Qwen2-VL/LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line 51, in run_sft
[rank0]: dataset_module = get_dataset(template, model_args, data_args, training_args, stage="sft", **tokenizer_module)
[rank0]: File "/mnt/gemininjceph2/geminicephfs/wx-mm-spr-xxxx/zhaochang/MEIJU/Qwen2-VL/LLaMA-Factory/src/llamafactory/data/loader.py", line 266, in get_dataset
[rank0]: dataset = _get_preprocessed_dataset(
[rank0]: File "/mnt/gemininjceph2/geminicephfs/wx-mm-spr-xxxx/zhaochang/MEIJU/Qwen2-VL/LLaMA-Factory/src/llamafactory/data/loader.py", line 203, in _get_preprocessed_dataset
[rank0]: dataset = dataset.map(
[rank0]: File "/mnt/gemininjceph2/geminicephfs/wx-mm-spr-xxxx/zhaochang/conda_envs/qwen2_vl/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 602, in wrapper
[rank0]: out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
[rank0]: File "/mnt/gemininjceph2/geminicephfs/wx-mm-spr-xxxx/zhaochang/conda_envs/qwen2_vl/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 567, in wrapper
[rank0]: out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
[rank0]: File "/mnt/gemininjceph2/geminicephfs/wx-mm-spr-xxxx/zhaochang/conda_envs/qwen2_vl/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3259, in map
[rank0]: for rank, done, content in iflatmap_unordered(
[rank0]: File "/mnt/gemininjceph2/geminicephfs/wx-mm-spr-xxxx/zhaochang/conda_envs/qwen2_vl/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 718, in iflatmap_unordered
[rank0]: [async_result.get(timeout=0.05) for async_result in async_results]
[rank0]: File "/mnt/gemininjceph2/geminicephfs/wx-mm-spr-xxxx/zhaochang/conda_envs/qwen2_vl/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 718, in
[rank0]: [async_result.get(timeout=0.05) for async_result in async_results]
[rank0]: File "/mnt/gemininjceph2/geminicephfs/wx-mm-spr-xxxx/zhaochang/conda_envs/qwen2_vl/lib/python3.10/site-packages/multiprocess/pool.py", line 774, in get
[rank0]: raise self._value
[rank0]: IndexError: list index out of range
而且我检查了我的数据集,里面的视频均存在且有效
请问是因为什么原因呢,我该如何修改

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

No branches or pull requests

3 participants