Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error when loading dataset in Hugging Face: NoneType error is not callable #7360

Open
nanu23333 opened this issue Jan 7, 2025 · 1 comment

Comments

@nanu23333
Copy link

Describe the bug

I met an error when running a notebook provide by Hugging Face, and met the error.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[2], line 5
      3 # Load the enhancers dataset from the InstaDeep Hugging Face ressources
      4 dataset_name = "enhancers_types"
----> 5 train_dataset_enhancers = load_dataset(
      6         "InstaDeepAI/nucleotide_transformer_downstream_tasks_revised",
      7         dataset_name,
      8         split="train",
      9         streaming= False,
     10     )
     11 test_dataset_enhancers = load_dataset(
     12         "InstaDeepAI/nucleotide_transformer_downstream_tasks_revised",
     13         dataset_name,
     14         split="test",
     15         streaming= False,
     16     )

File /public/home/hhl/miniconda3/envs/transformer/lib/python3.9/site-packages/datasets/load.py:2129, in load_dataset(path, name, data_dir, data_files, split, cache_dir, features, download_config, download_mode, verification_mode, keep_in_memory, save_infos, revision, token, streaming, num_proc, storage_options, trust_remote_code, **config_kwargs)
   2124 verification_mode = VerificationMode(
   2125     (verification_mode or VerificationMode.BASIC_CHECKS) if not save_infos else VerificationMode.ALL_CHECKS
   2126 )
   2128 # Create a dataset builder
-> 2129 builder_instance = load_dataset_builder(
   2130     path=path,
   2131     name=name,
   2132     data_dir=data_dir,
   2133     data_files=data_files,
   2134     cache_dir=cache_dir,
   2135     features=features,
   2136     download_config=download_config,
   2137     download_mode=download_mode,
   2138     revision=revision,
   2139     token=token,
   2140     storage_options=storage_options,
   2141     trust_remote_code=trust_remote_code,
   2142     _require_default_config_name=name is None,
   2143     **config_kwargs,
   2144 )
   2146 # Return iterable dataset in case of streaming
   2147 if streaming:

File /public/home/hhl/miniconda3/envs/transformer/lib/python3.9/site-packages/datasets/load.py:1886, in load_dataset_builder(path, name, data_dir, data_files, cache_dir, features, download_config, download_mode, revision, token, storage_options, trust_remote_code, _require_default_config_name, **config_kwargs)
   1884 builder_cls = get_dataset_builder_class(dataset_module, dataset_name=dataset_name)
   1885 # Instantiate the dataset builder
-> 1886 builder_instance: DatasetBuilder = builder_cls(
   1887     cache_dir=cache_dir,
   1888     dataset_name=dataset_name,
   1889     config_name=config_name,
   1890     data_dir=data_dir,
   1891     data_files=data_files,
   1892     hash=dataset_module.hash,
   1893     info=info,
   1894     features=features,
   1895     token=token,
   1896     storage_options=storage_options,
   1897     **builder_kwargs,
   1898     **config_kwargs,
   1899 )
   1900 builder_instance._use_legacy_cache_dir_if_possible(dataset_module)
   1902 return builder_instance

TypeError: 'NoneType' object is not callable

I have checked my internet, it worked well. And the dataset name was just copied from the Hugging Face.
Totally no idea what is wrong!

Steps to reproduce the bug

To reproduce the bug you may run

from datasets import load_dataset, Dataset

# Load the enhancers dataset from the InstaDeep Hugging Face ressources
dataset_name = "enhancers_types"
train_dataset_enhancers = load_dataset(
        "InstaDeepAI/nucleotide_transformer_downstream_tasks_revised",
        dataset_name,
        split="train",
        streaming= False,
    )
test_dataset_enhancers = load_dataset(
        "InstaDeepAI/nucleotide_transformer_downstream_tasks_revised",
        dataset_name,
        split="test",
        streaming= False,
    )

Expected behavior

  1. what may be the reasons of the error
  2. how can I fine which reason lead to the error
  3. how can I save the problem

Environment info

- `datasets` version: 3.2.0
- Platform: Linux-5.15.0-117-generic-x86_64-with-glibc2.31
- Python version: 3.9.21
- `huggingface_hub` version: 0.27.0
- PyArrow version: 18.1.0
- Pandas version: 2.2.3
- `fsspec` version: 2024.9.0
@lhoestq
Copy link
Member

lhoestq commented Jan 10, 2025

Hi ! I couldn't reproduce on my side, can you try deleting your cache at ~/.cache/huggingface/modules/datasets_modules/datasets/InstaDeepAI--nucleotide_transformer_downstream_tasks_revised and try again ? For some reason datasets wasn't able to find the DatasetBuilder class in the python script of this dataset

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants