Hashcode for test fold of Multi30k corrupt #2154

fmohr · 2023-04-18T16:35:16Z

🐛 Bug

Bug Description
When loading the test data via

multi_datapipe = Multi30k(split="test")

I get the following error (only occurs on test split). It seems that the hash currently associated with the tar file does not correspond to the one of the actual tar file on the server.

RuntimeError: The computed hash 0681be16a532912288a91ddd573594fbdd57c0fbb81486eff7c55247e35326c2 of ~/.cache/torch/text/datasets/Multi30k/mmt16_task1_test.tar.gz does not match the expectedhash 6d1ca1dba99e2c5dd54cae1226ff11c2551e6ce63527ebb072a1f70f72a5cd36. Delete the file manually and retry.

Needless to say, I deleted the file manually (in fact was deleted manually automatically by script).

Expected Behvior
I would this expect to work just as for split = "train" or split = "valid".

Environment
torchtext version is 0.14.1 (the environment collection script as left in the template is 404).

The text was updated successfully, but these errors were encountered:

Nayef211 · 2023-04-18T20:38:39Z

Hey @fmohr. We actually updated the expected hash of the file alongside where the file is downloaded from in #2003. So the behavior you notice is actually correct since you had an outdated copy of the file downloaded in your cache. The expected resolution would be to delete the cached file manually?

Nayef211 closed this as completed Apr 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hashcode for test fold of Multi30k corrupt #2154

Hashcode for test fold of Multi30k corrupt #2154

fmohr commented Apr 18, 2023 •

edited

Loading

Nayef211 commented Apr 18, 2023

Hashcode for test fold of Multi30k corrupt #2154

Hashcode for test fold of Multi30k corrupt #2154

Comments

fmohr commented Apr 18, 2023 • edited Loading

🐛 Bug

Nayef211 commented Apr 18, 2023

fmohr commented Apr 18, 2023 •

edited

Loading