Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace Zstandard wrapper with native Go implementation #396

Merged
merged 4 commits into from
Jan 26, 2020
Merged

Replace Zstandard wrapper with native Go implementation #396

merged 4 commits into from
Jan 26, 2020

Conversation

pascaldekloe
Copy link
Contributor

Follow-up on stale #303 with full use of the streaming API from @klauspost.

Decompression is slower while the (default) compression ratio went down from 5 to 3.

name                           old time/op    new time/op    delta
Compression/zstd/compress-4      5.64ms ± 2%    3.02ms ± 2%   -46.42%  (p=0.000 n=10+10)
Compression/zstd/decompress-4     911µs ± 2%    2386µs ± 2%  +162.10%  (p=0.000 n=10+10)

name                           old speed      new speed      delta
Compression/zstd/compress-4    21.0MB/s ± 2%  44.3MB/s ± 2%  +111.39%  (p=0.000 n=10+10)
Compression/zstd/decompress-4  2.13GB/s ± 2%  0.81GB/s ± 2%   -61.85%  (p=0.000 n=10+10)

@pascaldekloe pascaldekloe requested review from Pryz and achille-roussel and removed request for Pryz January 15, 2020 13:00
@achille-roussel
Copy link
Contributor

Thanks a lot for this contribution @pascaldekloe !

What do you think about keeping the compression ratio to 5? I'm concerned that programs out there which rely on this package will suddenly see a change in behavior which could lead to exhausting disk space on kafka brokers. The default compression ratio in kafka-go would differ from what the dependency for zstd sets, but we can document why we're making this decision.

Could you share a CPU and memory profile of the decompression benchmark? I'm curious to see where the difference comes from and if there's anything we can do about it.

Let me know!

@achille-roussel achille-roussel self-assigned this Jan 15, 2020
@klauspost
Copy link
Contributor

The numbers of the benchmark look a bit weird to me. Could you share the benchmark if it is not included?

go.mod Outdated Show resolved Hide resolved
@achille-roussel
Copy link
Contributor

achille-roussel commented Jan 15, 2020

@klauspost you can run the benchmark with go test -v -run '^$' -bench Compress/zstd in the root directory.

I took a closer look at the zstd package, it looks like there are lots of unguarded println calls (some are done if the debug constant is true, some aren't). Do these show up in practice? I'm concerned about printing a bunch of random stuff to stdout on every program which depends on kafka-go (e.g. https://github.com/klauspost/compress/blob/master/zstd/decoder.go#L216)

@klauspost
Copy link
Contributor

klauspost commented Jan 15, 2020

Where do I find this?

--- FAIL: BenchmarkCompression
    compression_test.go:291: open src\encoding\json\testdata\code.json.gz: The system cannot find the path specified.

Edit... Ahhh... From the Go source.... I don't have $GOROOT$ :)

@klauspost
Copy link
Contributor

klauspost commented Jan 15, 2020

There is something wonky with your benchmarks.

If you only do the compression the numbers are drastically different, eg:

func benchmarkCompression(b *testing.B, codec kafka.CompressionCodec, buf *bytes.Buffer, payload []byte) float64 {
	// In case only the decompression benchmark are run, we use this flags to
	// detect whether we have to compress the payload before the decompression
	// benchmarks.

	b.Run("compress", func(b *testing.B) {
		r := bytes.NewReader(payload)
		b.SetBytes(int64(len(payload)))
		b.ResetTimer()
		for i := 0; i < b.N; i++ {
			buf.Reset()
			r.Reset(payload)
			w := codec.NewWriter(buf)

			_, err := io.Copy(w, r)
			if err != nil {
				b.Fatal(err)
			}
			if err := w.Close(); err != nil {
				b.Fatal(err)
			}
		}
	})

	return 1 - (float64(buf.Len()) / float64(len(payload)))
}
λ go test -v -run '^$' -bench Compress
goos: windows
goarch: amd64
pkg: github.com/segmentio/kafka-go
BenchmarkCompression/none/compress-32              14197             80739 ns/op        24033.78 MB/s
BenchmarkCompression/gzip/compress-32                 51          21985308 ns/op          88.26 MB/s
BenchmarkCompression/snappy/compress-32             1110           1064202 ns/op        1823.41 MB/s
BenchmarkCompression/lz4/compress-32                 514           2313747 ns/op         838.67 MB/s
BenchmarkCompression/zstd/compress-32                420           2607726 ns/op         744.12 MB/s
input => 1.85 MB
  none:         0.00%
  gzip:         93.86%
  snappy:       86.37%
  lz4:          89.55%
  zstd:         93.10%

@klauspost
Copy link
Contributor

klauspost commented Jan 15, 2020

As a sidenote, since you already have the dependency, swapping out compress/gzip with github.com/klauspost/compress/gzip gives these numbers:

BenchmarkCompression/gzip/compress-32                160           7495428 ns/op         258.89 MB/s
gzip:         93.03%

With klauspost/pgzip:

BenchmarkCompression/gzip/compress-32                573           2077253 ns/op         934.15 MB/s
gzip:         92.90%

But it only really becomes an advantage on long streams, longer than the teststream. (900MB/s is low)

@achille-roussel
Copy link
Contributor

Nice! Thanks for the tip on compress/gzip!

The compression ratio is different in your benchmarks, is it due to using a different default compression level? Or is it due to different trade offs in the implementation?

@klauspost
Copy link
Contributor

Yes, it is for "default" to be a more reasonable default in terms of speed/size tradeoff.

This package attempts to provide a more smooth transition, where "1" is taking a lot of shortcuts, "5" is the reasonable trade-off and "9" is the "give me the best compression", and the values in between gives something reasonable in between. The standard library has big differences in levels 1-4, but levels 5-9 having no significant gains - often spending a lot more time than can be justified by the achieved compression.

https://blog.klauspost.com/rebalancing-deflate-compression-levels/

But it also has a lot other optimizations, for example in-compressible data being skipped more than 50x faster.

@pascaldekloe
Copy link
Contributor Author

To be clear, compression level 3 is the only compression level supported with the Go library. Other values will silently default to 3.

The compressed size win for level 5 is highly unlikely to exceed 20%. Kafka works well with fully packed disks thanks to the low fragmentation by design. I don't think anyone will run out of space unexpectedly due to this loss.

Those debug lines are more or less harmless.

func printf(format string, a ...interface{}) {
        if debug {
                log.Printf(format, a...)
        }
}

The benchmark result in the description were measured on a mid-range iMac from 2017. I had a quick look, and most of the slowdown comes from the high amount of data copies. The decoder does not make use the advertised frame size. Also, the data parsing uses abstract methods, which prevent all sorts of optimisations. This is all fixable in the long run of course.

@klauspost
Copy link
Contributor

To be clear, compression level 3 is the only compression level supported with the Go library.

(assuming you talk about zstd). A level 1 equivalent is also available. zst.SpeedFastest. But I see now you have a mixup in your code. I will add a comment.

zstd/zstd.go Outdated Show resolved Hide resolved
@pascaldekloe
Copy link
Contributor Author

pascaldekloe commented Jan 23, 2020

So what should we do?

Conns

  • lacks bounds checks (which could cause panics)
  • performance worse than snappy
  • limited compression levels

Pro's

  • no more C bindings
  • active maintenance

@stevevls
Copy link
Contributor

@pascaldekloe Do you have an updated set of benchmarks now that you've changed the compression level and upgraded the library?

Also, zstd is expected to be slower than snappy, so I wouldn't list that as a con. The compression ratio should be very favorable, though, so it's all about tradeoffs. 😄

@pascaldekloe
Copy link
Contributor Author

goos: darwin
goarch: amd64
pkg: github.com/segmentio/kafka-go
BenchmarkCompression/none/compress-4         	   14690	     81149 ns/op	23912.51 MB/s
BenchmarkCompression/none/decompress-4       	   30531	     38266 ns/op	50710.10 MB/s
BenchmarkCompression/gzip/compress-4         	      60	  19582193 ns/op	   6.08 MB/s
BenchmarkCompression/gzip/decompress-4       	     442	   2682437 ns/op	 723.40 MB/s
BenchmarkCompression/snappy/compress-4       	     897	   1302823 ns/op	 203.01 MB/s
BenchmarkCompression/snappy/decompress-4     	    1507	    779842 ns/op	2488.29 MB/s
BenchmarkCompression/lz4/compress-4          	     370	   3211224 ns/op	  63.17 MB/s
BenchmarkCompression/lz4/decompress-4        	     878	   1342304 ns/op	1445.63 MB/s
BenchmarkCompression/zstd/compress-4         	     314	   3658719 ns/op	  35.74 MB/s
BenchmarkCompression/zstd/decompress-4       	     493	   2379608 ns/op	 815.46 MB/s
input => 1.85 MB
  none:		0.00%
  gzip:		93.86%
  snappy:	86.37%
  lz4:		89.55%
  zstd:		93.26%

zstd/zstd.go Outdated Show resolved Hide resolved
zstd/zstd.go Outdated Show resolved Hide resolved
zstd/zstd.go Show resolved Hide resolved
zstd/zstd.go Show resolved Hide resolved
zstd/zstd.go Show resolved Hide resolved
zstd/zstd.go Show resolved Hide resolved
@klauspost
Copy link
Contributor

Looking at the numbers the benchmarks still looks wrong. Try to compare to the "clean" one I posted above. Maybe just split encode/decode benchmarks, since simpler may be better in this case.

LZ4 should perform on level with Snappy, and only the fastest level is really worth it on lz4 (using pierrec library). The rest are slow and a waste of CPU. I would propose to set the default to that.

@achille-roussel
Copy link
Contributor

LZ4 should perform on level with Snappy, and only the fastest level is really worth it on lz4 (using pierrec library). The rest are slow and a waste of CPU. I would propose to set the default to that.

Thanks for the tip! Let's make this another follow up to this PR as well to keep the focus on Zstd here 👍

Copy link
Contributor

@achille-roussel achille-roussel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is looking good to me 👍 Thanks for the contribution!

@achille-roussel achille-roussel merged commit 16d85b1 into segmentio:master Jan 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants