Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

7zip changes the file names when unpacking #134

Open
Djoop opened this issue Feb 9, 2021 · 3 comments
Open

7zip changes the file names when unpacking #134

Djoop opened this issue Feb 9, 2021 · 3 comments

Comments

@Djoop
Copy link

Djoop commented Feb 9, 2021

I have some code using the unpack function which fails with DataDeps 7.7 (apparently there were some changes to use 7zip on all platforms, not sure when exactly the breaking change happened). I have a wrapper for the following dataset: http://kdd.ics.uci.edu/databases/kddcup99/kddcup.data_10_percent.gz (not sure if the file has something to do or if this is a generic error), which contains a file called "kddcup.data_10_percent" (as can be seen e.g. using gunzip -l …), yet unpack creates a file called kddcup.data_10_percent_corrected (with some other files it ended up in .corrected).

Unpacking runs without an error, however it is inconvenient as I was expecting it to respect the file names (and this was the behavior with previous versions of DataDeps). Or is there a special function to use in order to obtain the path of unpacked files?

@oxinabox
Copy link
Owner

oxinabox commented Feb 9, 2021

Weird. I have never seen that happen before.
I think is is an upstream bug in 7zip.
Can you see if you can reproduce with 7zip alone?

As a work around you can add to the registration block:

post_fetch_method = compressed_filename -> run(`gunzip -l ...`)

@Djoop
Copy link
Author

Djoop commented Feb 10, 2021

Indeed, it seems to be an upstream bug (actually, I don't know if the bug is from 7zip or from gunzip…). Here is what I get with the 7zip packed with my distribution, the same archive yields two different file names with gunzip and 7z:

$ 7z l kddcup.data_10_percent.gz

7-Zip [64] 17.03 : Copyright (c) 1999-2020 Igor Pavlov : 2017-08-28
p7zip Version 17.03 (locale=fr_FR.UTF-8,Utf16=on,HugeFiles=on,64 bits,12 CPUs x64)

Scanning the drive for archives:
1 file, 2144903 bytes (2095 KiB)

Listing archive: kddcup.data_10_percent.gz

--
Path = kddcup.data_10_percent.gz
Type = gzip
Headers Size = 43

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2007-06-08 04:35:37 .....     74889749      2144903  kddcup.data_10_percent_corrected
------------------- ----- ------------ ------------  ------------------------
2007-06-08 04:35:37           74889749      2144903  1 files
------------------------------------------------------------------------------------------------

$ gunzip -l kddcup.data_10_percent.gz
         compressed        uncompressed  ratio uncompressed_name
            2144903            74889749  97.1% kddcup.data_10_percent

I don't know if there is anything special with this archive as I did not create it, yet this is surprising. Thanks for the workaround, I guess it works only if there is a single file in the archive?

@oxinabox
Copy link
Owner

oxinabox commented Feb 10, 2021

Thanks for the workaround, I guess it works only if there is a single file in the archive?

Well you can run what ever you want.
E.g. tar -xzf ... will do gzipped tarballs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants