-
Notifications
You must be signed in to change notification settings - Fork 27.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TypeError: 'type' object is not subscriptable #23472
Comments
Hi @flckv, thanks for raising this error. I'm unable to reproduce this error when I run locally on the main branch. Could you share the running environment being used: run |
Hi @amyeroberts, thanks for the quick reply. The output of transformers-cli env:
I am running on a cluster with resources:
in my .sh file that I run on this cluster has these commands to reproduce the demo : transformers-cli env Is this what you are asking for? My guessI think the error is coming from the fact that the dataset preprocessing (line 473) requires argument 1. I tried specifying --audio_column_name= []
2. I tried specifying --audio_column_name=["audio", "duration_ms", "text"]
3. I tried specifying --audio_column_name=["audio"], which is the default settingsame issue as in 1.
Any ideas? @amyeroberts @sanchit-gandhi @pacman100 @sgugger |
here is a more detailed output log content:
|
Hey @flckv! Could you try first updating all your packages to the latest versions?
The error looks like it's happening when we decode the soundfile (i.e. as we read the soundfile with librosa) - there was recently a big change to how we load audio samples with datasets that might fix this for you huggingface/datasets#5573 |
@sanchit-gandhi Thanks, but now the command is not working:
ERROR: Traceback (most recent call last):
Traceback (most recent call last): /var/lib/slurm-llnl/slurmd/job153086/slurm_script: line 45: EOL: command not found WHEN I specify this in the command args accelerate launch wav2vec/run_wav2vec2_pretraining_no_trainer.py --cache_dir="/dev/shm/" --dataset_name="librispeech_asr" --dataset_config_names test --dataset_split_names test --model_name_or_path="patrickvonplaten/wav2vec2-base-v2" --output_dir="./wav2vec2-pretrained-demo" then the error is:
WHEN I only add "id": Traceback (most recent call last):
-- schema metadata -- Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.
/var/lib/slurm-llnl/slurmd/job153092/slurm_script: line 50: EOL: |
Hey @flckv - great! Glad updating to the latest packages fixed the previous error. Can you try setting:
Here we just need to pick-out the correct column name for the audio inputs (which in this case is |
hey @sanchit-gandhi yes it is great! the column name is still not interpreted : I added what you said: accelerate launch wav2vec/run_wav2vec2_pretraining_no_trainer.py --cache_dir="/dev/shm/" --dataset_name="librispeech_asr" --dataset_config_names test --dataset_split_names test --model_name_or_path="patrickvonplaten/wav2vec2-base-v2" --output_dir="./wav2vec2-pretrained-demo" tried also: --audio_column_name='audio' BUT : Traceback (most recent call last): |
Can you double check you haven't changed the parser args for transformers/examples/pytorch/speech-pretraining/run_wav2vec2_pretraining_no_trainer.py Lines 112 to 117 in 50a56be
I can't see the check that is erroring out for you on the example script. Your error is occurring on line 513. If I check line 513 in the example, I get something completely different to the audio column name check: transformers/examples/pytorch/speech-pretraining/run_wav2vec2_pretraining_no_trainer.py Line 513 in 3d7baef
Could you make sure you are using the latest version of the script? You can just copy it from main. |
@sanchit-gandhi thanks, you were right. It works now. |
System Info
** pre-training wav2vec demo**
https://github.com/huggingface/transformers/blob/main/examples/pytorch/speech-pretraining/README.md running demo gives error:
File “./run_wav2vec2_pretraining_no_trainer.py", line 783, in
main()
File “./run_wav2vec2_pretraining_no_trainer.py", line 510, in main
vectorized_datasets = raw_datasets.map(
TypeError: 'type' object is not subscriptable
Who can help?
@sanchit-gandhi
@pacman100
@sgugger
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
just reproducing demo example with provided script and dataset:
https://github.com/huggingface/transformers/blob/main/examples/pytorch/speech-pretraining/README.md#demo
Expected behavior
output should be a pre-trained wav2vec model on librispeech dataset
The text was updated successfully, but these errors were encountered: