Skip to content

Releases: nla/httrack2warc

v0.6.1: Release 0.6.1

19 Apr 06:54
@ato ato
Compare
Choose a tag to compare

Bugs fixed:

  • Improved compatibility with WinHTTrack 3.48-22
  • Improved compatibility with HTTrack 3.21-4
  • Fixed handling of log lines missing filename

v0.6.0: Release 0.6.0

16 Apr 03:19
@ato ato
Compare
Choose a tag to compare

New features:

  • Added --redirect-file option to output synthetic redirects to a separate file
  • Added --verbose and --quiet options to control logging verbosity
  • Reduced default log level

Bugs fixed:

  • Fixed reading of non-ASCII filenames from new.txt
  • Handle new.txt lines with blank mime and etag
  • We now try fix up paths ending in .delayed when possible
  • Fixed 'is a directory' exception on some crawls
  • Handle filenames containing spaces when using --rewrite-links

v0.5.0

05 Mar 08:35
@ato ato
Compare
Choose a tag to compare

New features

  • --redirect-prefix option will generate synthetic redirect records

Bugs fixed

  • WARC files were being written with incorrect record trailers (CRLF not CRLFCRLF) #8

v0.4.0

13 Jan 05:38
@ato ato
Compare
Choose a tag to compare

New features:

  • --exclude REGEX excludes matching URLs from conversion
  • Redirects in hts-err.txt and logs/warn are now interpreted.
  • CLI now automatically unpacks tarballs

Bugs fixed:

  • HTTP headers for crawls that included both http and https versions of a URL
  • CDX files were hardcoded to status 200

v0.2.1

02 Feb 07:38
@ato ato
Compare
Choose a tag to compare
  • Bugs fixed
    • URLs are now canonicalized before looking up in the zip cache (issue #6).
    • HTTP error status codes that are missing their file on disk are now skipped with a warning. This is a workaround for HTTrack writing them with a different filename to what it records in the cache (issue #7). The workaround can be disabled using a new --strict option.

v0.2.0

08 Jan 04:36
@ato ato
Compare
Choose a tag to compare
  • Features added
    • HTTrack 3.0.1 support (#2)
    • Added --cdx option to write CDX files on the fly
  • Bugs fixed
    • Crawls without hts-ioinfo.txt are now accepted (#1)
    • The Transfer-Encoding HTTP header is stripped from converted response headers (#5)

v0.1.0 First release

27 Oct 04:56
@ato ato
Compare
Choose a tag to compare
Add httrack2warc version to warcinfo