You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Firstly can I just say, awesome project! Thank you for your efforts.
I'm currently facing an issue when running the training from the web ui.
Loading cached processed dataset at /home/david/.cache/huggingface/datasets/text/default-c1c19be682713dfa/0.0.0/c4a140d10f020282918b5dd1b8a49f0104729c6177f60a6b49ec2a365ec69f34/cache-0aebf50c61b7948a.arrow Running tokenizer on dataset: 0%| | 0/50 [00:00<?, ? examples/s] Exception in thread Thread-10 (run_exp): Traceback (most recent call last): File "/usr/lib/python3.11/threading.py", line 1038, in _bootstrap_inner self.run() File "/usr/lib/python3.11/threading.py", line 975, in run self._target(*self._args, **self._kwargs) File "/home/david/Code/github/LLaMA-Efficient-Tuning/src/llmtuner/tuner/tune.py", line 24, in run_exp run_pt(model_args, data_args, training_args, finetuning_args, callbacks) File "/home/david/Code/github/LLaMA-Efficient-Tuning/src/llmtuner/tuner/pt/workflow.py", line 26, in run_pt dataset = preprocess_dataset(dataset, tokenizer, data_args, training_args, stage="pt") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/david/Code/github/LLaMA-Efficient-Tuning/src/llmtuner/dsets/preprocess.py", line 165, in preprocess_dataset dataset = dataset.map( ^^^^^^^^^^^^ File "/home/david/Code/github/LLaMA-Efficient-Tuning/venv/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 592, in wrapper out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/david/Code/github/LLaMA-Efficient-Tuning/venv/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 557, in wrapper out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/david/Code/github/LLaMA-Efficient-Tuning/venv/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3097, in map for rank, done, content in Dataset._map_single(**dataset_kwargs): File "/home/david/Code/github/LLaMA-Efficient-Tuning/venv/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3474, in _map_single batch = apply_function_on_filtered_inputs( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/david/Code/github/LLaMA-Efficient-Tuning/venv/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3353, in apply_function_on_filtered_inputs processed_inputs = function(*fn_args, *additional_args, **fn_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/david/Code/github/LLaMA-Efficient-Tuning/src/llmtuner/dsets/preprocess.py", line 42, in preprocess_pretrain_dataset tokenized_examples = tokenizer(examples["prompt"], **kwargs) ^^^^^^ UnboundLocalError: cannot access local variable 'kwargs' where it is not associated with a value
Hi,
Firstly can I just say, awesome project! Thank you for your efforts.
I'm currently facing an issue when running the training from the web ui.
Loading cached processed dataset at /home/david/.cache/huggingface/datasets/text/default-c1c19be682713dfa/0.0.0/c4a140d10f020282918b5dd1b8a49f0104729c6177f60a6b49ec2a365ec69f34/cache-0aebf50c61b7948a.arrow Running tokenizer on dataset: 0%| | 0/50 [00:00<?, ? examples/s] Exception in thread Thread-10 (run_exp): Traceback (most recent call last): File "/usr/lib/python3.11/threading.py", line 1038, in _bootstrap_inner self.run() File "/usr/lib/python3.11/threading.py", line 975, in run self._target(*self._args, **self._kwargs) File "/home/david/Code/github/LLaMA-Efficient-Tuning/src/llmtuner/tuner/tune.py", line 24, in run_exp run_pt(model_args, data_args, training_args, finetuning_args, callbacks) File "/home/david/Code/github/LLaMA-Efficient-Tuning/src/llmtuner/tuner/pt/workflow.py", line 26, in run_pt dataset = preprocess_dataset(dataset, tokenizer, data_args, training_args, stage="pt") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/david/Code/github/LLaMA-Efficient-Tuning/src/llmtuner/dsets/preprocess.py", line 165, in preprocess_dataset dataset = dataset.map( ^^^^^^^^^^^^ File "/home/david/Code/github/LLaMA-Efficient-Tuning/venv/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 592, in wrapper out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/david/Code/github/LLaMA-Efficient-Tuning/venv/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 557, in wrapper out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/david/Code/github/LLaMA-Efficient-Tuning/venv/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3097, in map for rank, done, content in Dataset._map_single(**dataset_kwargs): File "/home/david/Code/github/LLaMA-Efficient-Tuning/venv/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3474, in _map_single batch = apply_function_on_filtered_inputs( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/david/Code/github/LLaMA-Efficient-Tuning/venv/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3353, in apply_function_on_filtered_inputs processed_inputs = function(*fn_args, *additional_args, **fn_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/david/Code/github/LLaMA-Efficient-Tuning/src/llmtuner/dsets/preprocess.py", line 42, in preprocess_pretrain_dataset tokenized_examples = tokenizer(examples["prompt"], **kwargs) ^^^^^^ UnboundLocalError: cannot access local variable 'kwargs' where it is not associated with a value
And the model config:
[INFO|configuration_utils.py:775] 2023-09-01 15:25:44,828 >> Model config LlamaConfig { "_name_or_path": "meta-llama/Llama-2-7b-hf", "architectures": [ "LlamaForCausalLM" ], "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 11008, "max_position_embeddings": 4096, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 32, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.32.1", "use_cache": true, "vocab_size": 32000 }
Any hints on what may be causing it?
Thanks again.
Edit:
CUDA_VISIBLE_DEVICES=0 python src/train_bash.py --stage pt --model_name_or_path meta-llama/Llama-2-7b-hf --do_train --dataset wiki_demo --template default --finetuning_type lora --lora_target q_proj,v_proj --output_dir path_to_pt_checkpoint --overwrite_cache --per_device_train_batch_size 4 --gradient_accumulation_steps 4 --lr_scheduler_type cosine --logging_steps 10 --save_steps 1000 --learning_rate 5e-5 --num_train_epochs 3.0 --plot_loss --fp16
Produces the same error.The text was updated successfully, but these errors were encountered: