Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

huff0: Pass a single bitReader pointer to asm #634

Merged
merged 1 commit into from
Jul 8, 2022

Conversation

greatroar
Copy link
Contributor

@greatroar greatroar commented Jun 30, 2022

This makes the context object smaller and frees up three registers, which we can use to replace the limitPtr and bufferOrigin stack
variables.

Benchmark results show a tiny win (Go 1.19beta, Core i7-3770K):

name                                           old speed      new speed      delta
Decompress1XTable/digits-8                      347MB/s ± 0%   347MB/s ± 0%    ~     (p=0.650 n=8+10)
Decompress1XTable/gettysburg-8                  268MB/s ± 0%   268MB/s ± 0%    ~     (p=0.400 n=9+9)
Decompress1XTable/twain-8                       327MB/s ± 0%   327MB/s ± 1%    ~     (p=0.339 n=7+9)
Decompress1XTable/low-ent.10k-8                 385MB/s ± 0%   385MB/s ± 1%    ~     (p=0.510 n=9+10)
Decompress1XTable/superlow-ent-10k-8            376MB/s ± 0%   376MB/s ± 0%    ~     (p=0.712 n=8+10)
Decompress1XTable/crash2-8                     17.3MB/s ± 1%  17.3MB/s ± 1%    ~     (p=0.926 n=10+10)
Decompress1XTable/endzerobits-8                52.9MB/s ± 1%  52.4MB/s ± 0%  -0.94%  (p=0.000 n=10+10)
Decompress1XTable/endnonzero-8                 11.4MB/s ± 0%  11.4MB/s ± 1%    ~     (p=0.343 n=10+10)
Decompress1XTable/case1-8                      22.0MB/s ± 0%  22.0MB/s ± 0%    ~     (p=0.618 n=9+9)
Decompress1XTable/case2-8                      18.1MB/s ± 0%  18.1MB/s ± 0%    ~     (p=0.348 n=9+9)
Decompress1XTable/case3-8                      19.1MB/s ± 0%  19.1MB/s ± 0%  +0.21%  (p=0.048 n=10+10)
Decompress1XTable/pngdata.001-8                 374MB/s ± 0%   374MB/s ± 0%    ~     (p=0.861 n=9+10)
Decompress1XTable/normcount2-8                 54.3MB/s ± 1%  54.5MB/s ± 1%    ~     (p=0.093 n=10+10)
Decompress1XNoTable/digits/100-8                279MB/s ± 0%   280MB/s ± 0%  +0.30%  (p=0.003 n=10+9)
Decompress1XNoTable/digits/10000-8              366MB/s ± 0%   365MB/s ± 0%    ~     (p=0.113 n=10+9)
Decompress1XNoTable/digits/262143-8             347MB/s ± 0%   347MB/s ± 1%    ~     (p=0.739 n=10+10)
Decompress1XNoTable/gettysburg/100-8            278MB/s ± 1%   277MB/s ± 1%    ~     (p=0.676 n=10+9)
Decompress1XNoTable/gettysburg/10000-8          363MB/s ± 1%   362MB/s ± 0%  -0.50%  (p=0.001 n=10+9)
Decompress1XNoTable/gettysburg/262143-8         350MB/s ± 0%   347MB/s ± 0%  -0.90%  (p=0.000 n=10+8)
Decompress1XNoTable/twain/100-8                 268MB/s ± 0%   267MB/s ± 0%    ~     (p=0.384 n=9+8)
Decompress1XNoTable/twain/10000-8               363MB/s ± 0%   362MB/s ± 0%  -0.32%  (p=0.000 n=9+9)
Decompress1XNoTable/twain/262143-8              328MB/s ± 0%   329MB/s ± 0%    ~     (p=0.063 n=9+10)
Decompress1XNoTable/low-ent.10k/100-8           180MB/s ± 0%   181MB/s ± 0%    ~     (p=0.225 n=10+10)
Decompress1XNoTable/low-ent.10k/10000-8         385MB/s ± 0%   385MB/s ± 0%    ~     (p=0.289 n=10+10)
Decompress1XNoTable/low-ent.10k/262143-8        389MB/s ± 1%   389MB/s ± 1%    ~     (p=0.971 n=10+10)
Decompress1XNoTable/superlow-ent-10k/262143-8   389MB/s ± 0%   390MB/s ± 0%  +0.27%  (p=0.017 n=9+10)
Decompress1XNoTable/crash2/100-8                278MB/s ± 0%   279MB/s ± 1%    ~     (p=0.163 n=9+10)
Decompress1XNoTable/crash2/10000-8              373MB/s ± 1%   373MB/s ± 0%    ~     (p=0.370 n=10+8)
Decompress1XNoTable/crash2/262143-8             375MB/s ± 0%   375MB/s ± 0%    ~     (p=0.604 n=9+10)
Decompress1XNoTable/endzerobits/100-8           180MB/s ± 0%   181MB/s ± 0%  +0.26%  (p=0.005 n=10+9)
Decompress1XNoTable/endzerobits/10000-8         384MB/s ± 0%   385MB/s ± 0%    ~     (p=0.914 n=8+10)
Decompress1XNoTable/endzerobits/262143-8        389MB/s ± 0%   390MB/s ± 0%    ~     (p=0.739 n=10+10)
Decompress1XNoTable/endnonzero/100-8            180MB/s ± 1%   180MB/s ± 1%    ~     (p=0.926 n=10+10)
Decompress1XNoTable/endnonzero/10000-8          384MB/s ± 0%   384MB/s ± 0%    ~     (p=0.965 n=10+8)
Decompress1XNoTable/endnonzero/262143-8         390MB/s ± 0%   390MB/s ± 0%    ~     (p=0.633 n=8+10)
Decompress1XNoTable/case1/100-8                 282MB/s ± 0%   283MB/s ± 0%  +0.34%  (p=0.005 n=10+10)
Decompress1XNoTable/case1/10000-8               372MB/s ± 0%   373MB/s ± 0%    ~     (p=0.113 n=9+9)
Decompress1XNoTable/case1/262143-8              374MB/s ± 0%   374MB/s ± 0%    ~     (p=0.448 n=10+10)
Decompress1XNoTable/case2/100-8                 274MB/s ± 1%   274MB/s ± 0%    ~     (p=0.927 n=10+10)
Decompress1XNoTable/case2/10000-8               376MB/s ± 0%   376MB/s ± 0%    ~     (p=0.408 n=10+8)
Decompress1XNoTable/case2/262143-8              376MB/s ± 1%   377MB/s ± 0%    ~     (p=1.000 n=10+10)
Decompress1XNoTable/case3/100-8                 266MB/s ± 0%   265MB/s ± 0%    ~     (p=0.113 n=9+10)
Decompress1XNoTable/case3/10000-8               372MB/s ± 0%   372MB/s ± 0%    ~     (p=0.075 n=10+9)
Decompress1XNoTable/case3/262143-8              374MB/s ± 0%   374MB/s ± 0%    ~     (p=0.172 n=10+10)
Decompress1XNoTable/pngdata.001/100-8           238MB/s ± 0%   238MB/s ± 0%    ~     (p=0.438 n=9+8)
Decompress1XNoTable/pngdata.001/10000-8         384MB/s ± 0%   384MB/s ± 0%    ~     (p=0.448 n=10+10)
Decompress1XNoTable/pngdata.001/262143-8        378MB/s ± 0%   378MB/s ± 0%    ~     (p=0.836 n=10+10)
Decompress1XNoTable/normcount2/100-8            281MB/s ± 0%   282MB/s ± 1%    ~     (p=0.122 n=8+10)
Decompress1XNoTable/normcount2/10000-8          369MB/s ± 1%   369MB/s ± 0%    ~     (p=0.912 n=10+10)
Decompress1XNoTable/normcount2/262143-8         370MB/s ± 0%   370MB/s ± 1%    ~     (p=0.342 n=10+10)
Decompress4XNoTable/digits/100-8                197MB/s ± 0%   197MB/s ± 1%    ~     (p=0.764 n=10+9)
Decompress4XNoTable/digits/10000-8              594MB/s ± 0%   602MB/s ± 1%  +1.35%  (p=0.000 n=10+10)
Decompress4XNoTable/digits/262143-8             570MB/s ± 1%   578MB/s ± 0%  +1.30%  (p=0.000 n=10+8)
Decompress4XNoTable/gettysburg/100-8            258MB/s ± 1%   260MB/s ± 0%  +0.59%  (p=0.001 n=10+10)
Decompress4XNoTable/gettysburg/10000-8          638MB/s ± 0%   641MB/s ± 0%  +0.44%  (p=0.000 n=9+9)
Decompress4XNoTable/gettysburg/262143-8         573MB/s ± 1%   574MB/s ± 0%    ~     (p=0.353 n=10+10)
Decompress4XNoTable/twain/100-8                 214MB/s ± 2%   214MB/s ± 2%    ~     (p=0.853 n=10+10)
Decompress4XNoTable/twain/10000-8               634MB/s ± 1%   638MB/s ± 0%  +0.62%  (p=0.000 n=10+10)
Decompress4XNoTable/twain/262143-8              513MB/s ± 1%   517MB/s ± 0%  +0.85%  (p=0.000 n=10+10)
Decompress4XNoTable/low-ent.10k/100-8           195MB/s ± 0%   194MB/s ± 0%    ~     (p=0.130 n=9+9)
Decompress4XNoTable/low-ent.10k/10000-8         635MB/s ± 0%   642MB/s ± 0%  +1.19%  (p=0.000 n=10+10)
Decompress4XNoTable/low-ent.10k/262143-8        675MB/s ± 0%   685MB/s ± 0%  +1.51%  (p=0.000 n=10+10)
Decompress4XNoTable/superlow-ent-10k/262143-8   673MB/s ± 1%   684MB/s ± 0%  +1.70%  (p=0.000 n=10+10)
Decompress4XNoTable/case1/100-8                 206MB/s ± 1%   206MB/s ± 0%    ~     (p=0.189 n=10+9)
Decompress4XNoTable/case1/10000-8               593MB/s ± 0%   601MB/s ± 0%  +1.47%  (p=0.000 n=10+10)
Decompress4XNoTable/case1/262143-8              603MB/s ± 0%   613MB/s ± 0%  +1.64%  (p=0.000 n=10+10)
Decompress4XNoTable/case2/100-8                 201MB/s ± 0%   202MB/s ± 1%    ~     (p=0.053 n=9+10)
Decompress4XNoTable/case2/10000-8               610MB/s ± 0%   618MB/s ± 0%  +1.30%  (p=0.000 n=9+10)
Decompress4XNoTable/case2/262143-8              622MB/s ± 1%   634MB/s ± 0%  +1.90%  (p=0.000 n=9+8)
Decompress4XNoTable/case3/100-8                 197MB/s ± 1%   198MB/s ± 0%  +0.53%  (p=0.001 n=9+10)
Decompress4XNoTable/case3/10000-8               606MB/s ± 0%   615MB/s ± 0%  +1.49%  (p=0.000 n=8+10)
Decompress4XNoTable/case3/262143-8              613MB/s ± 1%   622MB/s ± 0%  +1.48%  (p=0.000 n=10+10)
Decompress4XNoTable/pngdata.001/100-8           212MB/s ± 1%   211MB/s ± 0%    ~     (p=0.136 n=9+9)
Decompress4XNoTable/pngdata.001/10000-8         645MB/s ± 1%   649MB/s ± 1%  +0.65%  (p=0.000 n=9+10)
Decompress4XNoTable/pngdata.001/262143-8        640MB/s ± 1%   649MB/s ± 0%  +1.44%  (p=0.000 n=10+10)
Decompress4XNoTable/normcount2/100-8            260MB/s ± 1%   261MB/s ± 1%    ~     (p=0.211 n=10+9)
Decompress4XNoTable/normcount2/10000-8          584MB/s ± 1%   591MB/s ± 0%  +1.33%  (p=0.000 n=9+9)
Decompress4XNoTable/normcount2/262143-8         588MB/s ± 1%   596MB/s ± 1%  +1.39%  (p=0.000 n=10+9)
Decompress4XNoTableTableLog8/digits-8           583MB/s ± 1%   592MB/s ± 0%  +1.48%  (p=0.000 n=10+10)
Decompress4XTable/digits-8                      580MB/s ± 0%   588MB/s ± 0%  +1.33%  (p=0.000 n=8+10)
Decompress4XTable/gettysburg-8                  368MB/s ± 1%   370MB/s ± 0%  +0.59%  (p=0.017 n=10+9)
Decompress4XTable/twain-8                       510MB/s ± 0%   515MB/s ± 0%  +0.99%  (p=0.000 n=9+10)
Decompress4XTable/low-ent.10k-8                 657MB/s ± 0%   665MB/s ± 0%  +1.24%  (p=0.000 n=10+10)
Decompress4XTable/superlow-ent-10k-8            608MB/s ± 0%   617MB/s ± 1%  +1.48%  (p=0.000 n=8+10)
Decompress4XTable/case1-8                      21.1MB/s ± 1%  21.0MB/s ± 2%    ~     (p=0.223 n=10+10)
Decompress4XTable/case2-8                      17.6MB/s ± 0%  17.6MB/s ± 0%    ~     (p=0.199 n=9+10)
Decompress4XTable/case3-8                      18.7MB/s ± 0%  18.7MB/s ± 0%    ~     (p=0.557 n=10+8)
Decompress4XTable/pngdata.001-8                 633MB/s ± 1%   645MB/s ± 0%  +1.90%  (p=0.000 n=9+10)
Decompress4XTable/normcount2-8                 49.9MB/s ± 1%  49.5MB/s ± 1%  -0.64%  (p=0.002 n=10+10)
[Geo mean]                                      270MB/s        271MB/s       +0.36%

Copy link
Owner

@klauspost klauspost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines up with about what I'd expect. Good job. Just one detail.

huff0/_generate/gen.go Outdated Show resolved Hide resolved
@greatroar
Copy link
Contributor Author

Removed the /+1000/. I also found that the return value computation could be made one instruction shorter. I haven't rerun the benchmarks, since I don't expect that to have a significant effect.

This makes the context object smaller and frees up three registers,
which we can use to replace the limitPtr and bufferOrigin stack
variables.

Benchmark results show a tiny win (Go 1.19beta, Core i7-3770K):

	name                                           old speed      new speed      delta
	Decompress1XTable/digits-8                      347MB/s ± 0%   347MB/s ± 0%    ~     (p=0.650 n=8+10)
	Decompress1XTable/gettysburg-8                  268MB/s ± 0%   268MB/s ± 0%    ~     (p=0.400 n=9+9)
	Decompress1XTable/twain-8                       327MB/s ± 0%   327MB/s ± 1%    ~     (p=0.339 n=7+9)
	Decompress1XTable/low-ent.10k-8                 385MB/s ± 0%   385MB/s ± 1%    ~     (p=0.510 n=9+10)
	Decompress1XTable/superlow-ent-10k-8            376MB/s ± 0%   376MB/s ± 0%    ~     (p=0.712 n=8+10)
	Decompress1XTable/crash2-8                     17.3MB/s ± 1%  17.3MB/s ± 1%    ~     (p=0.926 n=10+10)
	Decompress1XTable/endzerobits-8                52.9MB/s ± 1%  52.4MB/s ± 0%  -0.94%  (p=0.000 n=10+10)
	Decompress1XTable/endnonzero-8                 11.4MB/s ± 0%  11.4MB/s ± 1%    ~     (p=0.343 n=10+10)
	Decompress1XTable/case1-8                      22.0MB/s ± 0%  22.0MB/s ± 0%    ~     (p=0.618 n=9+9)
	Decompress1XTable/case2-8                      18.1MB/s ± 0%  18.1MB/s ± 0%    ~     (p=0.348 n=9+9)
	Decompress1XTable/case3-8                      19.1MB/s ± 0%  19.1MB/s ± 0%  +0.21%  (p=0.048 n=10+10)
	Decompress1XTable/pngdata.001-8                 374MB/s ± 0%   374MB/s ± 0%    ~     (p=0.861 n=9+10)
	Decompress1XTable/normcount2-8                 54.3MB/s ± 1%  54.5MB/s ± 1%    ~     (p=0.093 n=10+10)
	Decompress1XNoTable/digits/100-8                279MB/s ± 0%   280MB/s ± 0%  +0.30%  (p=0.003 n=10+9)
	Decompress1XNoTable/digits/10000-8              366MB/s ± 0%   365MB/s ± 0%    ~     (p=0.113 n=10+9)
	Decompress1XNoTable/digits/262143-8             347MB/s ± 0%   347MB/s ± 1%    ~     (p=0.739 n=10+10)
	Decompress1XNoTable/gettysburg/100-8            278MB/s ± 1%   277MB/s ± 1%    ~     (p=0.676 n=10+9)
	Decompress1XNoTable/gettysburg/10000-8          363MB/s ± 1%   362MB/s ± 0%  -0.50%  (p=0.001 n=10+9)
	Decompress1XNoTable/gettysburg/262143-8         350MB/s ± 0%   347MB/s ± 0%  -0.90%  (p=0.000 n=10+8)
	Decompress1XNoTable/twain/100-8                 268MB/s ± 0%   267MB/s ± 0%    ~     (p=0.384 n=9+8)
	Decompress1XNoTable/twain/10000-8               363MB/s ± 0%   362MB/s ± 0%  -0.32%  (p=0.000 n=9+9)
	Decompress1XNoTable/twain/262143-8              328MB/s ± 0%   329MB/s ± 0%    ~     (p=0.063 n=9+10)
	Decompress1XNoTable/low-ent.10k/100-8           180MB/s ± 0%   181MB/s ± 0%    ~     (p=0.225 n=10+10)
	Decompress1XNoTable/low-ent.10k/10000-8         385MB/s ± 0%   385MB/s ± 0%    ~     (p=0.289 n=10+10)
	Decompress1XNoTable/low-ent.10k/262143-8        389MB/s ± 1%   389MB/s ± 1%    ~     (p=0.971 n=10+10)
	Decompress1XNoTable/superlow-ent-10k/262143-8   389MB/s ± 0%   390MB/s ± 0%  +0.27%  (p=0.017 n=9+10)
	Decompress1XNoTable/crash2/100-8                278MB/s ± 0%   279MB/s ± 1%    ~     (p=0.163 n=9+10)
	Decompress1XNoTable/crash2/10000-8              373MB/s ± 1%   373MB/s ± 0%    ~     (p=0.370 n=10+8)
	Decompress1XNoTable/crash2/262143-8             375MB/s ± 0%   375MB/s ± 0%    ~     (p=0.604 n=9+10)
	Decompress1XNoTable/endzerobits/100-8           180MB/s ± 0%   181MB/s ± 0%  +0.26%  (p=0.005 n=10+9)
	Decompress1XNoTable/endzerobits/10000-8         384MB/s ± 0%   385MB/s ± 0%    ~     (p=0.914 n=8+10)
	Decompress1XNoTable/endzerobits/262143-8        389MB/s ± 0%   390MB/s ± 0%    ~     (p=0.739 n=10+10)
	Decompress1XNoTable/endnonzero/100-8            180MB/s ± 1%   180MB/s ± 1%    ~     (p=0.926 n=10+10)
	Decompress1XNoTable/endnonzero/10000-8          384MB/s ± 0%   384MB/s ± 0%    ~     (p=0.965 n=10+8)
	Decompress1XNoTable/endnonzero/262143-8         390MB/s ± 0%   390MB/s ± 0%    ~     (p=0.633 n=8+10)
	Decompress1XNoTable/case1/100-8                 282MB/s ± 0%   283MB/s ± 0%  +0.34%  (p=0.005 n=10+10)
	Decompress1XNoTable/case1/10000-8               372MB/s ± 0%   373MB/s ± 0%    ~     (p=0.113 n=9+9)
	Decompress1XNoTable/case1/262143-8              374MB/s ± 0%   374MB/s ± 0%    ~     (p=0.448 n=10+10)
	Decompress1XNoTable/case2/100-8                 274MB/s ± 1%   274MB/s ± 0%    ~     (p=0.927 n=10+10)
	Decompress1XNoTable/case2/10000-8               376MB/s ± 0%   376MB/s ± 0%    ~     (p=0.408 n=10+8)
	Decompress1XNoTable/case2/262143-8              376MB/s ± 1%   377MB/s ± 0%    ~     (p=1.000 n=10+10)
	Decompress1XNoTable/case3/100-8                 266MB/s ± 0%   265MB/s ± 0%    ~     (p=0.113 n=9+10)
	Decompress1XNoTable/case3/10000-8               372MB/s ± 0%   372MB/s ± 0%    ~     (p=0.075 n=10+9)
	Decompress1XNoTable/case3/262143-8              374MB/s ± 0%   374MB/s ± 0%    ~     (p=0.172 n=10+10)
	Decompress1XNoTable/pngdata.001/100-8           238MB/s ± 0%   238MB/s ± 0%    ~     (p=0.438 n=9+8)
	Decompress1XNoTable/pngdata.001/10000-8         384MB/s ± 0%   384MB/s ± 0%    ~     (p=0.448 n=10+10)
	Decompress1XNoTable/pngdata.001/262143-8        378MB/s ± 0%   378MB/s ± 0%    ~     (p=0.836 n=10+10)
	Decompress1XNoTable/normcount2/100-8            281MB/s ± 0%   282MB/s ± 1%    ~     (p=0.122 n=8+10)
	Decompress1XNoTable/normcount2/10000-8          369MB/s ± 1%   369MB/s ± 0%    ~     (p=0.912 n=10+10)
	Decompress1XNoTable/normcount2/262143-8         370MB/s ± 0%   370MB/s ± 1%    ~     (p=0.342 n=10+10)
	Decompress4XNoTable/digits/100-8                197MB/s ± 0%   197MB/s ± 1%    ~     (p=0.764 n=10+9)
	Decompress4XNoTable/digits/10000-8              594MB/s ± 0%   602MB/s ± 1%  +1.35%  (p=0.000 n=10+10)
	Decompress4XNoTable/digits/262143-8             570MB/s ± 1%   578MB/s ± 0%  +1.30%  (p=0.000 n=10+8)
	Decompress4XNoTable/gettysburg/100-8            258MB/s ± 1%   260MB/s ± 0%  +0.59%  (p=0.001 n=10+10)
	Decompress4XNoTable/gettysburg/10000-8          638MB/s ± 0%   641MB/s ± 0%  +0.44%  (p=0.000 n=9+9)
	Decompress4XNoTable/gettysburg/262143-8         573MB/s ± 1%   574MB/s ± 0%    ~     (p=0.353 n=10+10)
	Decompress4XNoTable/twain/100-8                 214MB/s ± 2%   214MB/s ± 2%    ~     (p=0.853 n=10+10)
	Decompress4XNoTable/twain/10000-8               634MB/s ± 1%   638MB/s ± 0%  +0.62%  (p=0.000 n=10+10)
	Decompress4XNoTable/twain/262143-8              513MB/s ± 1%   517MB/s ± 0%  +0.85%  (p=0.000 n=10+10)
	Decompress4XNoTable/low-ent.10k/100-8           195MB/s ± 0%   194MB/s ± 0%    ~     (p=0.130 n=9+9)
	Decompress4XNoTable/low-ent.10k/10000-8         635MB/s ± 0%   642MB/s ± 0%  +1.19%  (p=0.000 n=10+10)
	Decompress4XNoTable/low-ent.10k/262143-8        675MB/s ± 0%   685MB/s ± 0%  +1.51%  (p=0.000 n=10+10)
	Decompress4XNoTable/superlow-ent-10k/262143-8   673MB/s ± 1%   684MB/s ± 0%  +1.70%  (p=0.000 n=10+10)
	Decompress4XNoTable/case1/100-8                 206MB/s ± 1%   206MB/s ± 0%    ~     (p=0.189 n=10+9)
	Decompress4XNoTable/case1/10000-8               593MB/s ± 0%   601MB/s ± 0%  +1.47%  (p=0.000 n=10+10)
	Decompress4XNoTable/case1/262143-8              603MB/s ± 0%   613MB/s ± 0%  +1.64%  (p=0.000 n=10+10)
	Decompress4XNoTable/case2/100-8                 201MB/s ± 0%   202MB/s ± 1%    ~     (p=0.053 n=9+10)
	Decompress4XNoTable/case2/10000-8               610MB/s ± 0%   618MB/s ± 0%  +1.30%  (p=0.000 n=9+10)
	Decompress4XNoTable/case2/262143-8              622MB/s ± 1%   634MB/s ± 0%  +1.90%  (p=0.000 n=9+8)
	Decompress4XNoTable/case3/100-8                 197MB/s ± 1%   198MB/s ± 0%  +0.53%  (p=0.001 n=9+10)
	Decompress4XNoTable/case3/10000-8               606MB/s ± 0%   615MB/s ± 0%  +1.49%  (p=0.000 n=8+10)
	Decompress4XNoTable/case3/262143-8              613MB/s ± 1%   622MB/s ± 0%  +1.48%  (p=0.000 n=10+10)
	Decompress4XNoTable/pngdata.001/100-8           212MB/s ± 1%   211MB/s ± 0%    ~     (p=0.136 n=9+9)
	Decompress4XNoTable/pngdata.001/10000-8         645MB/s ± 1%   649MB/s ± 1%  +0.65%  (p=0.000 n=9+10)
	Decompress4XNoTable/pngdata.001/262143-8        640MB/s ± 1%   649MB/s ± 0%  +1.44%  (p=0.000 n=10+10)
	Decompress4XNoTable/normcount2/100-8            260MB/s ± 1%   261MB/s ± 1%    ~     (p=0.211 n=10+9)
	Decompress4XNoTable/normcount2/10000-8          584MB/s ± 1%   591MB/s ± 0%  +1.33%  (p=0.000 n=9+9)
	Decompress4XNoTable/normcount2/262143-8         588MB/s ± 1%   596MB/s ± 1%  +1.39%  (p=0.000 n=10+9)
	Decompress4XNoTableTableLog8/digits-8           583MB/s ± 1%   592MB/s ± 0%  +1.48%  (p=0.000 n=10+10)
	Decompress4XTable/digits-8                      580MB/s ± 0%   588MB/s ± 0%  +1.33%  (p=0.000 n=8+10)
	Decompress4XTable/gettysburg-8                  368MB/s ± 1%   370MB/s ± 0%  +0.59%  (p=0.017 n=10+9)
	Decompress4XTable/twain-8                       510MB/s ± 0%   515MB/s ± 0%  +0.99%  (p=0.000 n=9+10)
	Decompress4XTable/low-ent.10k-8                 657MB/s ± 0%   665MB/s ± 0%  +1.24%  (p=0.000 n=10+10)
	Decompress4XTable/superlow-ent-10k-8            608MB/s ± 0%   617MB/s ± 1%  +1.48%  (p=0.000 n=8+10)
	Decompress4XTable/case1-8                      21.1MB/s ± 1%  21.0MB/s ± 2%    ~     (p=0.223 n=10+10)
	Decompress4XTable/case2-8                      17.6MB/s ± 0%  17.6MB/s ± 0%    ~     (p=0.199 n=9+10)
	Decompress4XTable/case3-8                      18.7MB/s ± 0%  18.7MB/s ± 0%    ~     (p=0.557 n=10+8)
	Decompress4XTable/pngdata.001-8                 633MB/s ± 1%   645MB/s ± 0%  +1.90%  (p=0.000 n=9+10)
	Decompress4XTable/normcount2-8                 49.9MB/s ± 1%  49.5MB/s ± 1%  -0.64%  (p=0.002 n=10+10)
	[Geo mean]                                      270MB/s        271MB/s       +0.36%
@klauspost
Copy link
Owner

Thanks!

@klauspost klauspost merged commit 4b3cc06 into klauspost:master Jul 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants