Releases: nla/httrack2warc
Releases · nla/httrack2warc
v0.6.1: Release 0.6.1
v0.6.0: Release 0.6.0
New features:
- Added --redirect-file option to output synthetic redirects to a separate file
- Added --verbose and --quiet options to control logging verbosity
- Reduced default log level
Bugs fixed:
- Fixed reading of non-ASCII filenames from new.txt
- Handle new.txt lines with blank mime and etag
- We now try fix up paths ending in .delayed when possible
- Fixed 'is a directory' exception on some crawls
- Handle filenames containing spaces when using --rewrite-links
v0.5.0
v0.4.0
New features:
--exclude REGEX
excludes matching URLs from conversion- Redirects in hts-err.txt and logs/warn are now interpreted.
- CLI now automatically unpacks tarballs
Bugs fixed:
- HTTP headers for crawls that included both http and https versions of a URL
- CDX files were hardcoded to status 200
v0.2.1
- Bugs fixed
- URLs are now canonicalized before looking up in the zip cache (issue #6).
- HTTP error status codes that are missing their file on disk are now skipped with a warning. This is a workaround for HTTrack writing them with a different filename to what it records in the cache (issue #7). The workaround can be disabled using a new
--strict
option.
v0.2.0
v0.1.0 First release
Add httrack2warc version to warcinfo