-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zstd: Improve decoder memcopy #637
Conversation
Improve memcopy for small matches. Up to 30% increased throughput, depending on input. ``` benchmark old MB/s new MB/s speedup Benchmark_seqdec_execute/n-12286-lits-13914-prev-9869-1990358-3296656-win-4194304.blk-32 1284.77 1525.03 1.19x Benchmark_seqdec_execute/n-12485-lits-6960-prev-976039-2250252-2463561-win-4194304.blk-32 1107.87 1614.28 1.46x Benchmark_seqdec_execute/n-14746-lits-14461-prev-209-8-1379909-win-4194304.blk-32 3947.25 4100.49 1.04x Benchmark_seqdec_execute/n-1525-lits-1498-prev-2009476-797934-2994405-win-4194304.blk-32 10281.12 10316.14 1.00x Benchmark_seqdec_execute/n-3478-lits-3628-prev-895243-2104056-2119329-win-4194304.blk-32 8115.99 8829.85 1.09x Benchmark_seqdec_execute/n-8422-lits-5840-prev-168095-2298675-433830-win-4194304.blk-32 1578.08 2290.47 1.45x Benchmark_seqdec_execute/n-1000-lits-1057-prev-21887-92-217-win-8388608.blk-32 17079.65 16716.41 0.98x Benchmark_seqdec_execute/n-15134-lits-20798-prev-4882976-4884216-4474622-win-8388608.blk-32 2020.09 2166.56 1.07x Benchmark_seqdec_execute/n-2-lits-0-prev-620601-689171-848-win-8388608.blk-32 35781.31 35745.53 1.00x Benchmark_seqdec_execute/n-90-lits-67-prev-19498-23-19710-win-8388608.blk-32 33125.43 32785.93 0.99x Benchmark_seqdec_execute/n-931-lits-1179-prev-36502-1526-1518-win-8388608.blk-32 19394.38 19643.49 1.01x Benchmark_seqdec_execute/n-2898-lits-4062-prev-335-386-751-win-8388608.blk-32 10494.30 10653.09 1.02x Benchmark_seqdec_execute/n-4056-lits-12419-prev-10792-66-309849-win-8388608.blk-32 7425.77 7506.51 1.01x Benchmark_seqdec_execute/n-8028-lits-4568-prev-917-65-920-win-8388608.blk-32 2855.17 3396.09 1.19x benchmark old MB/s new MB/s speedup BenchmarkDecoder_DecoderSmall/kppkn.gtb.zst-32 537.74 651.27 1.21x BenchmarkDecoder_DecoderSmall/geo.protodata.zst-32 1500.59 1610.11 1.07x BenchmarkDecoder_DecoderSmall/plrabn12.txt.zst-32 410.13 505.82 1.23x BenchmarkDecoder_DecoderSmall/lcet10.txt.zst-32 467.83 601.25 1.29x BenchmarkDecoder_DecoderSmall/asyoulik.txt.zst-32 434.53 530.71 1.22x BenchmarkDecoder_DecoderSmall/alice29.txt.zst-32 433.95 544.87 1.26x BenchmarkDecoder_DecoderSmall/html_x_4.zst-32 2860.31 3189.40 1.12x BenchmarkDecoder_DecoderSmall/paper-100k.pdf.zst-32 5336.43 5437.24 1.02x BenchmarkDecoder_DecoderSmall/fireworks.jpeg.zst-32 12327.10 12350.86 1.00x BenchmarkDecoder_DecoderSmall/urls.10K.zst-32 660.52 774.52 1.17x BenchmarkDecoder_DecoderSmall/html.zst-32 1076.67 1284.53 1.19x BenchmarkDecoder_DecoderSmall/comp-data.bin.zst-32 569.30 576.15 1.01x BenchmarkDecoder_DecodeAll/kppkn.gtb.zst-32 812.16 813.72 1.00x BenchmarkDecoder_DecodeAll/geo.protodata.zst-32 1943.14 1933.04 0.99x BenchmarkDecoder_DecodeAll/plrabn12.txt.zst-32 712.27 715.46 1.00x BenchmarkDecoder_DecodeAll/lcet10.txt.zst-32 688.23 775.97 1.13x BenchmarkDecoder_DecodeAll/asyoulik.txt.zst-32 702.87 700.17 1.00x BenchmarkDecoder_DecodeAll/alice29.txt.zst-32 717.44 720.89 1.00x BenchmarkDecoder_DecodeAll/html_x_4.zst-32 1960.55 1968.90 1.00x BenchmarkDecoder_DecodeAll/paper-100k.pdf.zst-32 5981.50 6169.12 1.03x BenchmarkDecoder_DecodeAll/fireworks.jpeg.zst-32 13140.18 13145.86 1.00x BenchmarkDecoder_DecodeAll/urls.10K.zst-32 983.71 988.16 1.00x BenchmarkDecoder_DecodeAll/html.zst-32 1624.80 1624.92 1.00x BenchmarkDecoder_DecodeAll/comp-data.bin.zst-32 569.84 570.96 1.00x BenchmarkDecoder_DecodeAllFiles/.tracker-unpacked.bin/fastest-32 504.31 622.83 1.24x BenchmarkDecoder_DecodeAllFiles/.tracker-unpacked.bin/default-32 564.68 717.57 1.27x BenchmarkDecoder_DecodeAllFiles/.tracker-unpacked.bin/better-32 615.18 766.33 1.25x BenchmarkDecoder_DecodeAllFiles/.tracker-unpacked.bin/best-32 786.17 857.17 1.09x BenchmarkDecoder_DecodeAllFiles/.tracker.bin/fastest-32 12860.99 12870.57 1.00x BenchmarkDecoder_DecodeAllFiles/.tracker.bin/default-32 619.06 617.54 1.00x BenchmarkDecoder_DecodeAllFiles/.tracker.bin/better-32 630.33 625.20 0.99x BenchmarkDecoder_DecodeAllFiles/.tracker.bin/best-32 609.12 612.50 1.01x BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/fastest-32 658.22 659.45 1.00x BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/default-32 723.60 729.95 1.01x BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/better-32 735.73 737.52 1.00x BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/best-32 745.43 749.55 1.01x BenchmarkDecoder_DecodeAllFiles/e.txt/fastest-32 12801.86 12967.61 1.01x BenchmarkDecoder_DecodeAllFiles/e.txt/default-32 680.29 677.69 1.00x BenchmarkDecoder_DecodeAllFiles/e.txt/better-32 739.23 733.45 0.99x BenchmarkDecoder_DecodeAllFiles/e.txt/best-32 820.16 825.62 1.01x BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/fastest-32 1186.63 1194.87 1.01x BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/default-32 1384.74 1412.45 1.02x BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/better-32 1104.17 1107.00 1.00x BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/best-32 409.59 409.27 1.00x BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/fastest-32 392.32 391.89 1.00x BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/default-32 296.47 296.65 1.00x BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/better-32 296.52 296.68 1.00x BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/best-32 299.85 295.83 0.99x BenchmarkDecoder_DecodeAllFiles/html.txt/fastest-32 988.75 996.39 1.01x BenchmarkDecoder_DecodeAllFiles/html.txt/default-32 987.11 989.51 1.00x BenchmarkDecoder_DecodeAllFiles/html.txt/better-32 1027.64 1038.21 1.01x BenchmarkDecoder_DecodeAllFiles/html.txt/best-32 973.41 989.86 1.02x BenchmarkDecoder_DecodeAllFiles/pi.txt/fastest-32 12976.96 13045.11 1.01x BenchmarkDecoder_DecodeAllFiles/pi.txt/default-32 678.88 674.53 0.99x BenchmarkDecoder_DecodeAllFiles/pi.txt/better-32 746.38 747.36 1.00x BenchmarkDecoder_DecodeAllFiles/pi.txt/best-32 823.52 827.84 1.01x BenchmarkDecoder_DecodeAllFiles/pngdata.bin/fastest-32 2115.58 2121.84 1.00x BenchmarkDecoder_DecodeAllFiles/pngdata.bin/default-32 1767.98 1779.35 1.01x BenchmarkDecoder_DecodeAllFiles/pngdata.bin/better-32 2306.86 2328.47 1.01x BenchmarkDecoder_DecodeAllFiles/pngdata.bin/best-32 1660.52 1684.65 1.01x BenchmarkDecoder_DecodeAllFiles/sharnd.out/fastest-32 13027.08 12999.49 1.00x BenchmarkDecoder_DecodeAllFiles/sharnd.out/default-32 13054.18 13084.25 1.00x BenchmarkDecoder_DecodeAllFiles/sharnd.out/better-32 13067.23 13099.47 1.00x BenchmarkDecoder_DecodeAllFiles/sharnd.out/best-32 13079.77 13104.13 1.00x BenchmarkDecoder_DecodeAllFilesP/.tracker-unpacked.bin/fastest-32 10354.84 11838.70 1.14x BenchmarkDecoder_DecodeAllFilesP/.tracker-unpacked.bin/default-32 11557.12 13404.78 1.16x BenchmarkDecoder_DecodeAllFilesP/.tracker-unpacked.bin/better-32 12644.67 14519.37 1.15x BenchmarkDecoder_DecodeAllFilesP/.tracker-unpacked.bin/best-32 15934.00 17312.77 1.09x BenchmarkDecoder_DecodeAllFilesP/.tracker.bin/fastest-32 35354.57 34836.95 0.99x BenchmarkDecoder_DecodeAllFilesP/.tracker.bin/default-32 11392.27 11275.11 0.99x BenchmarkDecoder_DecodeAllFilesP/.tracker.bin/better-32 11793.77 11771.24 1.00x BenchmarkDecoder_DecodeAllFilesP/.tracker.bin/best-32 11203.91 11142.52 0.99x BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/fastest-32 12089.54 11983.77 0.99x BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/default-32 12604.67 12514.75 0.99x BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/better-32 13265.79 13152.64 0.99x BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/best-32 13078.85 12983.91 0.99x BenchmarkDecoder_DecodeAllFilesP/e.txt/fastest-32 52477.17 52657.54 1.00x BenchmarkDecoder_DecodeAllFilesP/e.txt/default-32 11947.06 11809.75 0.99x BenchmarkDecoder_DecodeAllFilesP/e.txt/better-32 13184.17 13140.65 1.00x BenchmarkDecoder_DecodeAllFilesP/e.txt/best-32 14630.26 14718.01 1.01x BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/fastest-32 3013.25 3088.05 1.02x BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/default-32 3125.61 3091.48 0.99x BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/better-32 3181.68 3034.74 0.95x BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/best-32 3351.22 3526.91 1.05x BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/fastest-32 1188.15 1136.88 0.96x BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/default-32 1215.39 1193.99 0.98x BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/better-32 1219.20 1206.23 0.99x BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/best-32 1216.72 1200.26 0.99x BenchmarkDecoder_DecodeAllFilesP/html.txt/fastest-32 16901.32 17076.26 1.01x BenchmarkDecoder_DecodeAllFilesP/html.txt/default-32 16819.66 16892.32 1.00x BenchmarkDecoder_DecodeAllFilesP/html.txt/better-32 17805.12 17873.77 1.00x BenchmarkDecoder_DecodeAllFilesP/html.txt/best-32 16916.87 17184.02 1.02x BenchmarkDecoder_DecodeAllFilesP/pi.txt/fastest-32 52314.15 51687.88 0.99x BenchmarkDecoder_DecodeAllFilesP/pi.txt/default-32 11878.94 11778.57 0.99x BenchmarkDecoder_DecodeAllFilesP/pi.txt/better-32 13303.16 13162.44 0.99x BenchmarkDecoder_DecodeAllFilesP/pi.txt/best-32 14622.76 14717.80 1.01x BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/fastest-32 34134.48 37031.10 1.08x BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/default-32 33589.32 35277.28 1.05x BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/better-32 43754.89 44761.13 1.02x BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/best-32 32422.22 34107.42 1.05x BenchmarkDecoder_DecodeAllFilesP/sharnd.out/fastest-32 52706.00 52396.81 0.99x BenchmarkDecoder_DecodeAllFilesP/sharnd.out/default-32 52527.76 52048.36 0.99x BenchmarkDecoder_DecodeAllFilesP/sharnd.out/better-32 52177.25 52688.64 1.01x BenchmarkDecoder_DecodeAllFilesP/sharnd.out/best-32 52443.28 52799.86 1.01x BenchmarkDecoder_DecodeAllParallel/kppkn.gtb.zst-32 13992.47 13994.15 1.00x BenchmarkDecoder_DecodeAllParallel/geo.protodata.zst-32 34107.95 34221.23 1.00x BenchmarkDecoder_DecodeAllParallel/plrabn12.txt.zst-32 12012.34 11976.30 1.00x BenchmarkDecoder_DecodeAllParallel/lcet10.txt.zst-32 12630.22 13384.70 1.06x BenchmarkDecoder_DecodeAllParallel/asyoulik.txt.zst-32 12327.02 12251.04 0.99x BenchmarkDecoder_DecodeAllParallel/alice29.txt.zst-32 11932.73 11896.92 1.00x BenchmarkDecoder_DecodeAllParallel/html_x_4.zst-32 31233.38 36258.56 1.16x BenchmarkDecoder_DecodeAllParallel/paper-100k.pdf.zst-32 97435.31 100317.73 1.03x BenchmarkDecoder_DecodeAllParallel/fireworks.jpeg.zst-32 62247.22 62306.36 1.00x BenchmarkDecoder_DecodeAllParallel/urls.10K.zst-32 18659.58 18592.14 1.00x BenchmarkDecoder_DecodeAllParallel/html.zst-32 28464.78 28519.30 1.00x BenchmarkDecoder_DecodeAllParallel/comp-data.bin.zst-32 3114.03 3297.01 1.06x BenchmarkDecoderSilesia/multithreaded-writer-32 1099.69 1104.92 1.00x BenchmarkDecoderSilesia/multithreaded-writer-himem-32 1093.10 1102.98 1.01x BenchmarkDecoderSilesia/singlethreaded-writer-32 803.85 818.55 1.02x BenchmarkDecoderSilesia/singlethreaded-writerto-32 812.83 828.19 1.02x BenchmarkDecoderSilesia/singlethreaded-himem-32 813.14 828.32 1.02x BenchmarkDecoderEnwik9/multithreaded-writer-32 877.55 996.49 1.14x BenchmarkDecoderEnwik9/multithreaded-writer-himem-32 961.20 1036.76 1.08x BenchmarkDecoderEnwik9/singlethreaded-writer-32 632.07 631.96 1.00x BenchmarkDecoderEnwik9/singlethreaded-writerto-32 634.62 634.52 1.00x BenchmarkDecoderEnwik9/singlethreaded-himem-32 763.68 758.40 0.99x BenchmarkDecoderWithCustomFiles/github-june-2days-2019.json.zst/multithreaded-writer-32 1626.86 1730.88 1.06x BenchmarkDecoderWithCustomFiles/github-june-2days-2019.json.zst/multithreaded-writer-himem-32 2299.80 2375.04 1.03x BenchmarkDecoderWithCustomFiles/github-june-2days-2019.json.zst/singlethreaded-writer-32 1221.34 1221.43 1.00x BenchmarkDecoderWithCustomFiles/github-june-2days-2019.json.zst/singlethreaded-writerto-32 1236.18 1237.97 1.00x BenchmarkDecoderWithCustomFiles/github-june-2days-2019.json.zst/singlethreaded-himem-32 1749.21 1754.96 1.00x BenchmarkDecoderWithCustomFiles/github-ranks-backup.bin.zst/multithreaded-writer-32 839.51 933.63 1.11x BenchmarkDecoderWithCustomFiles/github-ranks-backup.bin.zst/multithreaded-writer-himem-32 1055.54 1100.37 1.04x BenchmarkDecoderWithCustomFiles/github-ranks-backup.bin.zst/singlethreaded-writer-32 574.91 613.88 1.07x BenchmarkDecoderWithCustomFiles/github-ranks-backup.bin.zst/singlethreaded-writerto-32 579.19 618.72 1.07x BenchmarkDecoderWithCustomFiles/github-ranks-backup.bin.zst/singlethreaded-himem-32 780.67 867.96 1.11x ```
After #636 from @greatroar I tried copying the s2 memcopy. This yields a significant speedup in most cases. |
That is a great improvement! There are some nice speed-ups. :) In the case of our commercial sample data, there's almost no change.
|
Running fuzz test on these two changes for a couple of hours. |
@WojciechMula Great to see it is a win across platforms, and no regressions. 4 hours of fuzz testing makes it seem fine. |
Up to 25% faster decodes, depending on contents.
Use s2 memcopier and eliminate a zero check.