-
Notifications
You must be signed in to change notification settings - Fork 1
Tool to find out compressed sizes of files in a .tar.gz archive.
License
liori/targz-sizes
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
targz_sizes =========== targz_sizes is a small and simple tool to find out how much space do specific files take inside a .tar.gz file, after compression. It does that by reading and decompressing the archive file in memory, then dumping raw information on the standard output. Archive file is processed in small chunks, so that memory usage is minimal. This program has been written to check which directories take the most space in my `duplicity` backups. This is different than checking space usage on a live file system because of two factors: compression (text files often take very little space in archives compared to live file system) and incremental backups (files that change often are stored multiple times in backups). targz_sizes performs only basic data gathering, you will need something to provide proper statistics though. Usage ----- No options are implemented. The archive file is taken on standard input, file sizes are printed on standard output. E.g.: targz_sizes <my_archive.tar.gz >file_sizes.txt Output consists of lines, where each line shows size in bytes and name of file, e.g.: 145 targz_sizes-1.0/ 63 targz_sizes-1.0/AUTHORS 3525 targz_sizes-1.0/missing 42103 targz_sizes-1.0/configure 93 targz_sizes-1.0/NEWS 12936 targz_sizes-1.0/config.guess 10025 targz_sizes-1.0/config.sub Note that directory entries also take up space in the archive! Also note that technically a file can be encoded by any number of bits, not necessarily being round number of bytes. In such cases, amount of space shown in the output is approximate. Known problems -------------- To compile you need to pass `--with-zlib` to ./configure. `zlib` is required, even if the configure script goes well without this switch. E.g.: ./configure --with-zlib Ustar-formatted file sizes are not supported, the output will be garbage. Ustar format allows storing files bigger than 8GB in tar archives. GNU tar does not use this format unless it is necessary, so if you used GNU tar and you know there are no files bigger than 8GB, you're safe. Standard input is not reopened in binary mode, as technically there is no way to perform this operation with standard C primitives. This will be a problem on platforms that by default provide standard input as a text stream (e.g. Windows). In such cases, the output will be garbage. The above problems should be easily fixable. You are free to fork the project if you want to fix them. Archive file names are not re-encoded to current locale. You can use `iconv` for that. Archive file names are truncated to 100 characters (the limit for original tar specification). There are format extensions which can cope with longer names, but those are not supported by targz_sizes. The output will contain the names as they appear in the 100-byte field in the header, most probably simply truncated.
About
Tool to find out compressed sizes of files in a .tar.gz archive.
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published