Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

One of the three datasets returned by Multi30k seems to be bugged. #2001

Closed
raaaaaymond opened this issue Dec 7, 2022 · 2 comments
Closed

Comments

@raaaaaymond
Copy link

🐛 Bug

Describe the bug A clear and concise description of what the bug is.

The testing data returned by Multi30k doesn't match the expected SHA256 hash. The precise error is:

RuntimeError: The computed hash 0681be16a532912288a91ddd573594fbdd57c0fbb81486eff7c55247e35326c2 of C:\Users\raaaa/.cache\torch\text\datasets\Multi30k\mmt16_task1_test.tar.gz does not match the expectedhash 6d1ca1dba99e2c5dd54cae1226ff11c2551e6ce63527ebb072a1f70f72a5cd36. Delete the file manually and retry.
This exception is thrown by __iter__ of HashCheckerIterDataPipe(hash_dict={'C:\\Users\\raaaa/.cache\\torch\\text\\datasets\\Multi30k\\mmt16_task1_test.tar.gz': '6d1ca1dba99e2c5dd54cae1226ff11c2551e6ce63527ebb072a1f70f72a5cd36'}, hash_type='sha256', rewind=True, source_datapipe=MapperIterDataPipe)

I've done what the message suggested; I deleted the files manually and did it again, but the same error occurs.

To Reproduce Steps to reproduce the behavior:

Paste the following into a new Python file and run it.

import torchtext

def _main():
    train, val, test = torchtext.datasets.Multi30k(language_pair=("de", "en"))
    # The following works fine because `val` and `train` datasets are fine.
    # for thing in val:
    #     print(thing)
    #     break
    # Invoking the generator (which is `test`) in the following way triggers the error.
    for thing in test:
        print(thing)
        break


if __name__ == "__main__":
    _main()

You should see the error I pasted above.

Expected behavior A clear and concise description of what you expected to happen.

I expect no error.

Environment

PyTorch version: 1.13.0+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 11 Pro
GCC version: (x86_64-posix-seh, Built by strawberryperl.com project) 8.3.0
Clang version: Could not collect
CMake version: version 3.20.2
Libc version: N/A

Python version: 3.10.1 (tags/v3.10.1:2cd268a, Dec 6 2021, 19:10:37) [MSC v.1929 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.22000-SP0
Is CUDA available: True
CUDA runtime version: 11.7.64
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3080
Nvidia driver version: 526.86
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] mypy==0.950
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.23.4
[pip3] torch==1.13.0+cu117
[pip3] torchaudio==0.13.0+cu117
[pip3] torchdata==0.5.0
[pip3] torchtext==0.14.0
[pip3] torchvision==0.14.0+cu117
[conda] Could not collect
You can get the script and run it with:

Additional context Add any other context about the problem here.

@rshraga
Copy link
Contributor

rshraga commented Dec 7, 2022

Thanks for the report @raaaaaymond I created a PR to fix #2003

@rshraga rshraga closed this as completed Dec 7, 2022
@raaaaaymond
Copy link
Author

Thank you @rshraga .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants