Replace Zstandard wrapper with native Go implementation #396

pascaldekloe · 2020-01-15T12:56:54Z

Follow-up on stale #303 with full use of the streaming API from @klauspost.

Decompression is slower while the (default) compression ratio went down from 5 to 3.

name                           old time/op    new time/op    delta
Compression/zstd/compress-4      5.64ms ± 2%    3.02ms ± 2%   -46.42%  (p=0.000 n=10+10)
Compression/zstd/decompress-4     911µs ± 2%    2386µs ± 2%  +162.10%  (p=0.000 n=10+10)

name                           old speed      new speed      delta
Compression/zstd/compress-4    21.0MB/s ± 2%  44.3MB/s ± 2%  +111.39%  (p=0.000 n=10+10)
Compression/zstd/decompress-4  2.13GB/s ± 2%  0.81GB/s ± 2%   -61.85%  (p=0.000 n=10+10)

achille-roussel · 2020-01-15T16:33:03Z

Thanks a lot for this contribution @pascaldekloe !

What do you think about keeping the compression ratio to 5? I'm concerned that programs out there which rely on this package will suddenly see a change in behavior which could lead to exhausting disk space on kafka brokers. The default compression ratio in kafka-go would differ from what the dependency for zstd sets, but we can document why we're making this decision.

Could you share a CPU and memory profile of the decompression benchmark? I'm curious to see where the difference comes from and if there's anything we can do about it.

Let me know!

klauspost · 2020-01-15T16:38:45Z

The numbers of the benchmark look a bit weird to me. Could you share the benchmark if it is not included?

go.mod

achille-roussel · 2020-01-15T16:58:05Z

@klauspost you can run the benchmark with go test -v -run '^$' -bench Compress/zstd in the root directory.

I took a closer look at the zstd package, it looks like there are lots of unguarded println calls (some are done if the debug constant is true, some aren't). Do these show up in practice? I'm concerned about printing a bunch of random stuff to stdout on every program which depends on kafka-go (e.g. https://github.com/klauspost/compress/blob/master/zstd/decoder.go#L216)

klauspost · 2020-01-15T17:12:57Z

Where do I find this?

--- FAIL: BenchmarkCompression
    compression_test.go:291: open src\encoding\json\testdata\code.json.gz: The system cannot find the path specified.

Edit... Ahhh... From the Go source.... I don't have $GOROOT$ :)

klauspost · 2020-01-15T17:37:17Z

There is something wonky with your benchmarks.

If you only do the compression the numbers are drastically different, eg:

func benchmarkCompression(b *testing.B, codec kafka.CompressionCodec, buf *bytes.Buffer, payload []byte) float64 {
	// In case only the decompression benchmark are run, we use this flags to
	// detect whether we have to compress the payload before the decompression
	// benchmarks.

	b.Run("compress", func(b *testing.B) {
		r := bytes.NewReader(payload)
		b.SetBytes(int64(len(payload)))
		b.ResetTimer()
		for i := 0; i < b.N; i++ {
			buf.Reset()
			r.Reset(payload)
			w := codec.NewWriter(buf)

			_, err := io.Copy(w, r)
			if err != nil {
				b.Fatal(err)
			}
			if err := w.Close(); err != nil {
				b.Fatal(err)
			}
		}
	})

	return 1 - (float64(buf.Len()) / float64(len(payload)))
}

λ go test -v -run '^$' -bench Compress
goos: windows
goarch: amd64
pkg: github.com/segmentio/kafka-go
BenchmarkCompression/none/compress-32              14197             80739 ns/op        24033.78 MB/s
BenchmarkCompression/gzip/compress-32                 51          21985308 ns/op          88.26 MB/s
BenchmarkCompression/snappy/compress-32             1110           1064202 ns/op        1823.41 MB/s
BenchmarkCompression/lz4/compress-32                 514           2313747 ns/op         838.67 MB/s
BenchmarkCompression/zstd/compress-32                420           2607726 ns/op         744.12 MB/s
input => 1.85 MB
  none:         0.00%
  gzip:         93.86%
  snappy:       86.37%
  lz4:          89.55%
  zstd:         93.10%

klauspost · 2020-01-15T17:40:13Z

As a sidenote, since you already have the dependency, swapping out compress/gzip with github.com/klauspost/compress/gzip gives these numbers:

BenchmarkCompression/gzip/compress-32                160           7495428 ns/op         258.89 MB/s
gzip:         93.03%

With klauspost/pgzip:

BenchmarkCompression/gzip/compress-32                573           2077253 ns/op         934.15 MB/s
gzip:         92.90%

But it only really becomes an advantage on long streams, longer than the teststream. (900MB/s is low)

zstd/zstd.go

achille-roussel · 2020-01-15T20:06:21Z

Nice! Thanks for the tip on compress/gzip!

The compression ratio is different in your benchmarks, is it due to using a different default compression level? Or is it due to different trade offs in the implementation?

klauspost · 2020-01-15T20:24:38Z

Yes, it is for "default" to be a more reasonable default in terms of speed/size tradeoff.

This package attempts to provide a more smooth transition, where "1" is taking a lot of shortcuts, "5" is the reasonable trade-off and "9" is the "give me the best compression", and the values in between gives something reasonable in between. The standard library has big differences in levels 1-4, but levels 5-9 having no significant gains - often spending a lot more time than can be justified by the achieved compression.

https://blog.klauspost.com/rebalancing-deflate-compression-levels/

But it also has a lot other optimizations, for example in-compressible data being skipped more than 50x faster.

pascaldekloe · 2020-01-16T17:20:46Z

To be clear, compression level 3 is the only compression level supported with the Go library. Other values will silently default to 3.

The compressed size win for level 5 is highly unlikely to exceed 20%. Kafka works well with fully packed disks thanks to the low fragmentation by design. I don't think anyone will run out of space unexpectedly due to this loss.

Those debug lines are more or less harmless.

func printf(format string, a ...interface{}) {
        if debug {
                log.Printf(format, a...)
        }
}

The benchmark result in the description were measured on a mid-range iMac from 2017. I had a quick look, and most of the slowdown comes from the high amount of data copies. The decoder does not make use the advertised frame size. Also, the data parsing uses abstract methods, which prevent all sorts of optimisations. This is all fixable in the long run of course.

klauspost · 2020-01-16T17:56:44Z

To be clear, compression level 3 is the only compression level supported with the Go library.

(assuming you talk about zstd). A level 1 equivalent is also available. zst.SpeedFastest. But I see now you have a mixup in your code. I will add a comment.

zstd/zstd.go

pascaldekloe · 2020-01-23T19:46:12Z

So what should we do?

Conns

lacks bounds checks (which could cause panics)
~~performance worse than snappy~~
limited compression levels

Pro's

no more C bindings
active maintenance

stevevls · 2020-01-23T19:49:38Z

@pascaldekloe Do you have an updated set of benchmarks now that you've changed the compression level and upgraded the library?

Also, zstd is expected to be slower than snappy, so I wouldn't list that as a con. The compression ratio should be very favorable, though, so it's all about tradeoffs. 😄

pascaldekloe · 2020-01-23T19:52:29Z

goos: darwin
goarch: amd64
pkg: github.com/segmentio/kafka-go
BenchmarkCompression/none/compress-4         	   14690	     81149 ns/op	23912.51 MB/s
BenchmarkCompression/none/decompress-4       	   30531	     38266 ns/op	50710.10 MB/s
BenchmarkCompression/gzip/compress-4         	      60	  19582193 ns/op	   6.08 MB/s
BenchmarkCompression/gzip/decompress-4       	     442	   2682437 ns/op	 723.40 MB/s
BenchmarkCompression/snappy/compress-4       	     897	   1302823 ns/op	 203.01 MB/s
BenchmarkCompression/snappy/decompress-4     	    1507	    779842 ns/op	2488.29 MB/s
BenchmarkCompression/lz4/compress-4          	     370	   3211224 ns/op	  63.17 MB/s
BenchmarkCompression/lz4/decompress-4        	     878	   1342304 ns/op	1445.63 MB/s
BenchmarkCompression/zstd/compress-4         	     314	   3658719 ns/op	  35.74 MB/s
BenchmarkCompression/zstd/decompress-4       	     493	   2379608 ns/op	 815.46 MB/s
input => 1.85 MB
  none:		0.00%
  gzip:		93.86%
  snappy:	86.37%
  lz4:		89.55%
  zstd:		93.26%

zstd/zstd.go

klauspost · 2020-01-24T11:15:54Z

Looking at the numbers the benchmarks still looks wrong. Try to compare to the "clean" one I posted above. Maybe just split encode/decode benchmarks, since simpler may be better in this case.

LZ4 should perform on level with Snappy, and only the fastest level is really worth it on lz4 (using pierrec library). The rest are slow and a waste of CPU. I would propose to set the default to that.

achille-roussel · 2020-01-26T19:32:07Z

LZ4 should perform on level with Snappy, and only the fastest level is really worth it on lz4 (using pierrec library). The rest are slow and a waste of CPU. I would propose to set the default to that.

Thanks for the tip! Let's make this another follow up to this PR as well to keep the focus on Zstd here 👍

achille-roussel

This change is looking good to me 👍 Thanks for the contribution!

Replace Zstandard wrapper with native Go implementation.

9915ad6

pascaldekloe requested review from Pryz and achille-roussel and removed request for Pryz January 15, 2020 13:00

achille-roussel self-assigned this Jan 15, 2020

achille-roussel reviewed Jan 15, 2020

View reviewed changes

go.mod Outdated Show resolved Hide resolved

klauspost reviewed Jan 15, 2020

View reviewed changes

zstd/zstd.go Show resolved Hide resolved

klauspost reviewed Jan 16, 2020

View reviewed changes

zstd/zstd.go Outdated Show resolved Hide resolved

pascaldekloe added 2 commits January 16, 2020 20:59

Use literal for compression levels. The EncoderLevel uses its own scale.

8fe9110

Upgrade compress (Zstandard) library.

88bcfeb

achille-roussel reviewed Jan 23, 2020

View reviewed changes

zstd/zstd.go Outdated Show resolved Hide resolved

zstd/zstd.go Outdated Show resolved Hide resolved

zstd/zstd.go Show resolved Hide resolved

zstd/zstd.go Show resolved Hide resolved

zstd/zstd.go Show resolved Hide resolved

zstd/zstd.go Show resolved Hide resolved

Apply Agille's code review.

8446e53

achille-roussel approved these changes Jan 26, 2020

View reviewed changes

achille-roussel merged commit 16d85b1 into segmentio:master Jan 26, 2020

achille-roussel mentioned this pull request Feb 14, 2020

#303 Replace zstd package to remove cgo #308

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace Zstandard wrapper with native Go implementation #396

Replace Zstandard wrapper with native Go implementation #396

pascaldekloe commented Jan 15, 2020

achille-roussel commented Jan 15, 2020

klauspost commented Jan 15, 2020

achille-roussel commented Jan 15, 2020 •

edited

Loading

klauspost commented Jan 15, 2020 •

edited

Loading

klauspost commented Jan 15, 2020 •

edited

Loading

klauspost commented Jan 15, 2020 •

edited

Loading

achille-roussel commented Jan 15, 2020

klauspost commented Jan 15, 2020

pascaldekloe commented Jan 16, 2020

klauspost commented Jan 16, 2020

pascaldekloe commented Jan 23, 2020 •

edited

Loading

stevevls commented Jan 23, 2020

pascaldekloe commented Jan 23, 2020

klauspost commented Jan 24, 2020

achille-roussel commented Jan 26, 2020

achille-roussel left a comment

Replace Zstandard wrapper with native Go implementation #396

Replace Zstandard wrapper with native Go implementation #396

Conversation

pascaldekloe commented Jan 15, 2020

achille-roussel commented Jan 15, 2020

klauspost commented Jan 15, 2020

achille-roussel commented Jan 15, 2020 • edited Loading

klauspost commented Jan 15, 2020 • edited Loading

klauspost commented Jan 15, 2020 • edited Loading

klauspost commented Jan 15, 2020 • edited Loading

achille-roussel commented Jan 15, 2020

klauspost commented Jan 15, 2020

pascaldekloe commented Jan 16, 2020

klauspost commented Jan 16, 2020

pascaldekloe commented Jan 23, 2020 • edited Loading

Conns

Pro's

stevevls commented Jan 23, 2020

pascaldekloe commented Jan 23, 2020

klauspost commented Jan 24, 2020

achille-roussel commented Jan 26, 2020

achille-roussel left a comment

Choose a reason for hiding this comment

achille-roussel commented Jan 15, 2020 •

edited

Loading

klauspost commented Jan 15, 2020 •

edited

Loading

klauspost commented Jan 15, 2020 •

edited

Loading

klauspost commented Jan 15, 2020 •

edited

Loading

pascaldekloe commented Jan 23, 2020 •

edited

Loading