Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hashcode for test fold of Multi30k corrupt #2154

Closed
fmohr opened this issue Apr 18, 2023 · 1 comment
Closed

Hashcode for test fold of Multi30k corrupt #2154

fmohr opened this issue Apr 18, 2023 · 1 comment

Comments

@fmohr
Copy link

fmohr commented Apr 18, 2023

🐛 Bug

Bug Description
When loading the test data via

multi_datapipe = Multi30k(split="test")

I get the following error (only occurs on test split). It seems that the hash currently associated with the tar file does not correspond to the one of the actual tar file on the server.

RuntimeError: The computed hash 0681be16a532912288a91ddd573594fbdd57c0fbb81486eff7c55247e35326c2 of ~/.cache/torch/text/datasets/Multi30k/mmt16_task1_test.tar.gz does not match the expectedhash 6d1ca1dba99e2c5dd54cae1226ff11c2551e6ce63527ebb072a1f70f72a5cd36. Delete the file manually and retry.

Needless to say, I deleted the file manually (in fact was deleted manually automatically by script).

Expected Behvior
I would this expect to work just as for split = "train" or split = "valid".

Environment
torchtext version is 0.14.1 (the environment collection script as left in the template is 404).

@Nayef211
Copy link
Contributor

Hey @fmohr. We actually updated the expected hash of the file alongside where the file is downloaded from in #2003. So the behavior you notice is actually correct since you had an outdated copy of the file downloaded in your cache. The expected resolution would be to delete the cached file manually?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants