You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While importing an audiofolder dataset, where the names of the audiofiles don't correspond to the filenames in the metadata.csv, we get an unclear error message that is not helpful for the debugging, i.e.
ValueError: Instruction "train" corresponds to no data!
Steps to reproduce the bug
Assume an audiofolder with audiofiles, filename1.mp3, filename2.mp3 etc and a file metadata.csv which contains the columns file_name and sentence. The file_names are formatted like filename1.mp3, filename2.mp3 etc.
Load the audio
from datasets import load_dataset
load_dataset("audiofolder", data_dir='/path/to/audiofolder')
When the file_names in the csv are not in sync with the filenames in the audiofolder, then we get an Error message:
File /opt/conda/lib/python3.12/site-packages/datasets/arrow_reader.py:251, in BaseReader.read(self, name, instructions, split_infos, in_memory)
249 if not files:
250 msg = f'Instruction "{instructions}" corresponds to no data!'
--> 251 raise ValueError(msg)
252 return self.read_files(files=files, original_instructions=instructions, in_memory=in_memory)
ValueError: Instruction "train" corresponds to no data!
load_dataset has a default value for the argument split = 'train'.
Expected behavior
It would be better to get an error report something like:
The metadata.csv file has different filenames than the files in the datadirectory.
I'd prefer even more verbose errors; like "file123.mp3" is referenced in metadata.csv, but not found in the data directory '/path/to/audiofolder' ! (and 100+ more missing files) Or something along those lines.
svencornetsdegroot
changed the title
Importing dataset gives bad error message when filename's in metadata.csv are not found in the directory
Importing dataset gives unhelpful error message when filename's in metadata.csv are not found in the directory
Jan 14, 2025
svencornetsdegroot
changed the title
Importing dataset gives unhelpful error message when filename's in metadata.csv are not found in the directory
Importing dataset gives unhelpful error message when filenames in metadata.csv are not found in the directory
Jan 14, 2025
Describe the bug
While importing an audiofolder dataset, where the names of the audiofiles don't correspond to the filenames in the metadata.csv, we get an unclear error message that is not helpful for the debugging, i.e.
Steps to reproduce the bug
Assume an audiofolder with audiofiles, filename1.mp3, filename2.mp3 etc and a file metadata.csv which contains the columns file_name and sentence. The file_names are formatted like filename1.mp3, filename2.mp3 etc.
Load the audio
When the file_names in the csv are not in sync with the filenames in the audiofolder, then we get an Error message:
load_dataset has a default value for the argument split = 'train'.
Expected behavior
It would be better to get an error report something like:
It would have saved me 4 hours of debugging.
Environment info
datasets
version: 3.2.0huggingface_hub
version: 0.27.0fsspec
version: 2024.9.0The text was updated successfully, but these errors were encountered: