Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sm2: arm64架构性能问题 #175

Closed
zhangyongding opened this issue Oct 31, 2023 · 18 comments
Closed

sm2: arm64架构性能问题 #175

zhangyongding opened this issue Oct 31, 2023 · 18 comments
Labels
help wanted Extra attention is needed

Comments

@zhangyongding
Copy link

近期测试sm2的性能,发现在arm64下,有两个地方耗时比较多,不知道是否有优化的空间?

(pprof) list p256BaseMult
Total: 108.85s
ROUTINE ======================== github.com/emmansun/gmsm/internal/sm2ec.(*SM2P256Point).p256BaseMult in github.com/emmansun/[email protected]/internal/sm2ec/sm2p256_asm.go
     180ms      8.01s (flat, cum)  7.36% of Total
         .          .    799:func (p *SM2P256Point) p256BaseMult(scalar *p256OrdElement) {
         .          .    800:   var t0 p256AffinePoint
         .          .    801:
         .          .    802:   wvalue := (scalar[0] << 1) & 0x7f
         .          .    803:   sel, sign := boothW6(uint(wvalue))
         .       60ms    804:   p256SelectAffine(&t0, &p256Precomputed[0], sel)
         .          .    805:   p.x, p.y, p.z = t0.x, t0.y, p256One
      10ms       10ms    806:   p256NegCond(&p.y, sign)
         .          .    807:
         .          .    808:   index := uint(5)
         .          .    809:   zero := sel
         .          .    810:
         .          .    811:   for i := 1; i < 43; i++ {
         .          .    812:           if index >= 192 {
         .          .    813:                   wvalue = (scalar[3] >> (index & 63)) & 0x7f
      10ms       10ms    814:           } else if index >= 128 {
      10ms       10ms    815:                   wvalue = ((scalar[2] >> (index & 63)) + (scalar[3] << (64 - (index & 63)))) & 0x7f
      10ms       10ms    816:           } else if index >= 64 {
         .          .    817:                   wvalue = ((scalar[1] >> (index & 63)) + (scalar[2] << (64 - (index & 63)))) & 0x7f
         .          .    818:           } else {
      10ms       10ms    819:                   wvalue = ((scalar[0] >> (index & 63)) + (scalar[1] << (64 - (index & 63)))) & 0x7f
         .          .    820:           }
      10ms       10ms    821:           index += 6
         .          .    822:           sel, sign = boothW6(uint(wvalue))
      20ms      3.14s    823:           p256SelectAffine(&t0, &p256Precomputed[i], sel)
      70ms      4.72s    824:           p256PointAddAffineAsm(p, p, &t0, sign, sel, zero)
      20ms       20ms    825:           zero |= sel
         .          .    826:   }
         .          .    827:
         .          .    828:   // If the whole scalar was zero, set to the point at infinity.
      10ms       10ms    829:   p256MovCond(p, p, NewSM2P256Point(), zero)
         .          .    830:}
         .          .    831:
         .          .    832:func (p *SM2P256Point) p256ScalarMult(scalar *p256OrdElement) {
         .          .    833:   // precomp is a table of precomputed points that stores powers of p
         .          .    834:   // from p^1 to p^32.
(pprof) quit
(pprof) list ScalarBaseMult
Total: 108.85s
ROUTINE ======================== github.com/emmansun/gmsm/internal/sm2ec.(*SM2P256Point).ScalarBaseMult in github.com/emmansun/[email protected]/internal/sm2ec/sm2p256_asm.go
         0      8.04s (flat, cum)  7.39% of Total
         .          .    456:func (r *SM2P256Point) ScalarBaseMult(scalar []byte) (*SM2P256Point, error) {
         .          .    457:   if len(scalar) != 32 {
         .          .    458:           return nil, errors.New("invalid scalar length")
         .          .    459:   }
         .          .    460:   scalarReversed := new(p256OrdElement)
         .       10ms    461:   p256OrdBigToLittle(scalarReversed, toElementArray(scalar))
         .       20ms    462:   p256OrdReduce(scalarReversed)
         .      8.01s    463:   r.p256BaseMult(scalarReversed)
         .          .    464:   return r, nil
         .          .    465:}
         .          .    466:
         .          .    467:// ScalarMult sets r = scalar * q, where scalar is a 32-byte big endian value,
         .          .    468:// and returns r. If scalar is not 32 bytes long, ScalarBaseMult returns an
ROUTINE ======================== github.com/emmansun/gmsm/sm2/sm2ec.(*sm2Curve).ScalarBaseMult in github.com/emmansun/[email protected]/sm2/sm2ec/sm2ec.go
      20ms     10.47s (flat, cum)  9.62% of Total
      10ms       10ms    120:func (curve *sm2Curve) ScalarBaseMult(scalar []byte) (*big.Int, *big.Int) {
         .       30ms    121:   scalar = curve.normalizeScalar(scalar)
         .      8.08s    122:   p, err := curve.newPoint().ScalarBaseMult(scalar)
      10ms       10ms    123:   if err != nil {
         .          .    124:           panic("sm2/elliptic: sm2 rejected normalized scalar")
         .          .    125:   }
         .      2.34s    126:   return curve.pointToAffine(p)
         .          .    127:}
         .          .    128:
         .          .    129:// CombinedMult returns [s1]G + [s2]P where G is the generator. It's used
         .          .    130:// through an interface upgrade in crypto/ecdsa.
         .          .    131:func (curve *sm2Curve) CombinedMult(Px, Py *big.Int, s1, s2 []byte) (x, y *big.Int) {
(pprof) 

@emmansun
Copy link
Owner

椭圆曲线类算法最耗计算资源的就是曲线点标量乘法,同时为了算法安全性,实现需要尽量做到Constant-Time运行。目前的实现在arm64下(也包括amd64),应该没有多少优化空间。你也可以测试golang SDK的NIST P-256曲线,性能应该差不多。

@emmansun
Copy link
Owner

emmansun commented Nov 1, 2023

我没有arm64开发测试环境,所有arm64开发测试都是通过QEMU模拟进行的,如果方便的话,帮忙跑一下所有Package的Benchmark测试。

@emmansun emmansun changed the title 性能问题 sm2: arm64架构性能问题 Nov 2, 2023
@emmansun
Copy link
Owner

emmansun commented Nov 8, 2023

@zhangyongding 你可以比较一下v0.22.0。

@zhangyongding
Copy link
Author

好的,迟点我看看

@emmansun emmansun added the help wanted Extra attention is needed label Dec 11, 2023
@mango19970707
Copy link

我在mac上跑了一下,sm2和4与其他的比较了一下,性能好很多👍

goos: darwin
goarch: arm64
BenchmarkSM4BCEncrypt1K
BenchmarkSM4BCEncrypt1K-8        	  109899	      9677 ns/op	 105.81 MB/s
BenchmarkSM4BCDecrypt1K
BenchmarkSM4BCDecrypt1K-8        	  140384	      8289 ns/op	 123.54 MB/s
BenchmarkSM4HCTREncrypt1K
BenchmarkSM4HCTREncrypt1K-8      	   75216	     15850 ns/op	  64.60 MB/s
BenchmarkSM4ECBEncrypt1K
BenchmarkSM4ECBEncrypt1K-8       	  152919	      7787 ns/op	 131.50 MB/s
BenchmarkAES128ECBEncrypt1K
BenchmarkAES128ECBEncrypt1K-8    	 3112140	       384.6 ns/op	2662.30 MB/s
BenchmarkAESCBCEncrypt1K
BenchmarkAESCBCEncrypt1K-8       	 1390166	       872.6 ns/op	1173.54 MB/s
BenchmarkSM4CBCEncrypt1K
BenchmarkSM4CBCEncrypt1K-8       	  123598	      9383 ns/op	 109.14 MB/s
BenchmarkAESCBCDecrypt1K
BenchmarkAESCBCDecrypt1K-8       	 2084564	       576.0 ns/op	1777.84 MB/s
BenchmarkSM4CBCDecrypt1K
BenchmarkSM4CBCDecrypt1K-8       	  146947	      8084 ns/op	 126.68 MB/s
BenchmarkAESCFBEncrypt1K
BenchmarkAESCFBEncrypt1K-8       	 1000000	      1016 ns/op	1003.07 MB/s
BenchmarkSM4CFBEncrypt1K
BenchmarkSM4CFBEncrypt1K-8       	  122415	      9503 ns/op	 107.23 MB/s
BenchmarkAESCFBDecrypt1K
BenchmarkAESCFBDecrypt1K-8       	 1541154	       777.1 ns/op	1311.32 MB/s
BenchmarkSM4CFBDecrypt1K
BenchmarkSM4CFBDecrypt1K-8       	  145435	      8083 ns/op	 126.06 MB/s
BenchmarkAESCFBDecrypt8K
BenchmarkAESCFBDecrypt8K-8       	  193323	      6088 ns/op	1344.81 MB/s
BenchmarkSM4CFBDecrypt8K
BenchmarkSM4CFBDecrypt8K-8       	   18462	     65067 ns/op	 125.82 MB/s
BenchmarkAESOFB1K
BenchmarkAESOFB1K-8              	 1544918	       776.3 ns/op	1312.67 MB/s
BenchmarkSM4OFB1K
BenchmarkSM4OFB1K-8              	  129934	      8997 ns/op	 113.27 MB/s
BenchmarkAESCTR1K
BenchmarkAESCTR1K-8              	 2550990	       473.7 ns/op	2151.15 MB/s
BenchmarkSM4CTR1K
BenchmarkSM4CTR1K-8              	  146665	      7814 ns/op	 130.41 MB/s
BenchmarkAESCTR8K
BenchmarkAESCTR8K-8              	  326775	      3707 ns/op	2208.26 MB/s
BenchmarkSM4CTR8K
BenchmarkSM4CTR8K-8              	   18528	     64486 ns/op	 126.96 MB/s
BenchmarkAESGCMSeal1K
BenchmarkAESGCMSeal1K-8          	 7791330	       152.6 ns/op	6708.25 MB/s
BenchmarkSM4GCMSeal1K
BenchmarkSM4GCMSeal1K-8          	   98953	     12186 ns/op	  84.03 MB/s
BenchmarkAESGCMOpen1K
BenchmarkAESGCMOpen1K-8          	 8066070	       148.3 ns/op	6904.15 MB/s
BenchmarkSM4GCMOpen1K
BenchmarkSM4GCMOpen1K-8          	   96859	     12196 ns/op	  83.96 MB/s
BenchmarkAESGCMSign1K
BenchmarkAESGCMSign1K-8          	12899636	        94.17 ns/op	10874.52 MB/s
BenchmarkSM4GCMSign1K
BenchmarkSM4GCMSign1K-8          	  367958	      3223 ns/op	 317.76 MB/s
BenchmarkAESGCMSign8K
BenchmarkAESGCMSign8K-8          	 1447760	       828.3 ns/op	9890.02 MB/s
BenchmarkSM4GCMSign8K
BenchmarkSM4GCMSign8K-8          	   53313	     22542 ns/op	 363.42 MB/s
BenchmarkAESGCMSeal8K
BenchmarkAESGCMSeal8K-8          	 1000000	      1104 ns/op	7420.75 MB/s
BenchmarkSM4GCMSeal8K
BenchmarkSM4GCMSeal8K-8          	   12484	     95677 ns/op	  85.62 MB/s
BenchmarkAESGCMOpen8K
BenchmarkAESGCMOpen8K-8          	 1000000	      1008 ns/op	8124.05 MB/s
BenchmarkSM4GCMOpen8K
BenchmarkSM4GCMOpen8K-8          	   12544	     94564 ns/op	  86.63 MB/s
BenchmarkAESCCMSign1K
BenchmarkAESCCMSign1K-8          	 1000000	      1043 ns/op	 982.02 MB/s
BenchmarkSM4CCMSign1K
BenchmarkSM4CCMSign1K-8          	  119500	     10006 ns/op	 102.34 MB/s
BenchmarkAESCCMSeal1K
BenchmarkAESCCMSeal1K-8          	  803734	      1496 ns/op	 684.38 MB/s
BenchmarkSM4CCMSeal1K
BenchmarkSM4CCMSeal1K-8          	   66871	     17847 ns/op	  57.38 MB/s
BenchmarkAESCCMOpen1K
BenchmarkAESCCMOpen1K-8          	  798380	      1500 ns/op	 682.76 MB/s
BenchmarkSM4CCMOpen1K
BenchmarkSM4CCMOpen1K-8          	   66926	     17880 ns/op	  57.27 MB/s
BenchmarkAESCCMSign8K
BenchmarkAESCCMSign8K-8          	  166414	      7099 ns/op	1154.02 MB/s
BenchmarkSM4CCMSign8K
BenchmarkSM4CCMSign8K-8          	   15818	     75749 ns/op	 108.15 MB/s
BenchmarkAESCCMSeal8K
BenchmarkAESCCMSeal8K-8          	  111427	     10764 ns/op	 761.05 MB/s
BenchmarkSM4CCMSeal8K
BenchmarkSM4CCMSeal8K-8          	    8474	    139214 ns/op	  58.84 MB/s
BenchmarkAESCCMOpen8K
BenchmarkAESCCMOpen8K-8          	  111283	     10872 ns/op	 753.48 MB/s
BenchmarkSM4CCMOpen8K
BenchmarkSM4CCMOpen8K-8          	    8264	    142793 ns/op	  57.37 MB/s
BenchmarkAES128XTSEncrypt512
BenchmarkAES128XTSEncrypt512-8   	 2748026	       431.2 ns/op	1187.42 MB/s
BenchmarkAES128XTSEncrypt1K
BenchmarkAES128XTSEncrypt1K-8    	 1368452	       868.1 ns/op	1179.58 MB/s
BenchmarkAES128XTSEncrypt4K
BenchmarkAES128XTSEncrypt4K-8    	  347162	      3452 ns/op	1186.46 MB/s
BenchmarkAES256XTSEncrypt512
BenchmarkAES256XTSEncrypt512-8   	 2466746	       482.0 ns/op	1062.33 MB/s
BenchmarkAES256XTSEncrypt1K
BenchmarkAES256XTSEncrypt1K-8    	 1217998	       971.0 ns/op	1054.64 MB/s
BenchmarkAES256XTSEncrypt4K
BenchmarkAES256XTSEncrypt4K-8    	  303680	      3927 ns/op	1043.17 MB/s
BenchmarkSM4XTSEncrypt512
BenchmarkSM4XTSEncrypt512-8      	  275744	      4321 ns/op	 118.50 MB/s
BenchmarkSM4XTSEncrypt1K
BenchmarkSM4XTSEncrypt1K-8       	  138394	      8584 ns/op	 119.30 MB/s
BenchmarkSM4XTSEncrypt4K
BenchmarkSM4XTSEncrypt4K-8       	   35049	     34037 ns/op	 120.34 MB/s
BenchmarkSM4XTSEncrypt512_GB
BenchmarkSM4XTSEncrypt512_GB-8   	  277178	      4333 ns/op	 118.16 MB/s
BenchmarkSM4XTSEncrypt1K_GB
BenchmarkSM4XTSEncrypt1K_GB-8    	  134841	      8763 ns/op	 116.86 MB/s
BenchmarkSM4XTSEncrypt4K_GB
BenchmarkSM4XTSEncrypt4K_GB-8    	   34456	     34945 ns/op	 117.21 MB/s
BenchmarkAES128XTSDecrypt512
BenchmarkAES128XTSDecrypt512-8   	 2710066	       441.5 ns/op	1159.73 MB/s
BenchmarkAES128XTSDecrypt1K
BenchmarkAES128XTSDecrypt1K-8    	 1362556	       871.1 ns/op	1175.54 MB/s
BenchmarkAES128XTSDecrypt4K
BenchmarkAES128XTSDecrypt4K-8    	  345694	      3460 ns/op	1183.84 MB/s
BenchmarkAES256XTSDecrypt512
BenchmarkAES256XTSDecrypt512-8   	 2361979	       484.2 ns/op	1057.31 MB/s
BenchmarkAES256XTSDecrypt1K
BenchmarkAES256XTSDecrypt1K-8    	 1243346	       959.4 ns/op	1067.31 MB/s
BenchmarkAES256XTSDecrypt4K
BenchmarkAES256XTSDecrypt4K-8    	  310352	      3823 ns/op	1071.48 MB/s
BenchmarkSM4XTSDecrypt512
BenchmarkSM4XTSDecrypt512-8      	  270830	      4421 ns/op	 115.81 MB/s
BenchmarkSM4XTSDecrypt1K
BenchmarkSM4XTSDecrypt1K-8       	  135820	      8907 ns/op	 114.97 MB/s
BenchmarkSM4XTSDecrypt4K
BenchmarkSM4XTSDecrypt4K-8       	   34519	     34847 ns/op	 117.54 MB/s
BenchmarkSM4XTSDecrypt512_GB
BenchmarkSM4XTSDecrypt512_GB-8   	  270850	      4463 ns/op	 114.73 MB/s
BenchmarkSM4XTSDecrypt1K_GB
BenchmarkSM4XTSDecrypt1K_GB-8    	  131619	      8914 ns/op	 114.88 MB/s
BenchmarkSM4XTSDecrypt4K_GB
BenchmarkSM4XTSDecrypt4K_GB-8    	   33703	     35299 ns/op	 116.04 MB/s
PASS

@emmansun
Copy link
Owner

谢谢 @mango19970707 ,可以share一下CPU信息吗?譬如通过sysctl -a | grep machdep.cpu 命令行。

@mango19970707
Copy link

machdep.cpu.cores_per_package: 8
machdep.cpu.core_count: 8
machdep.cpu.logical_per_package: 8
machdep.cpu.thread_count: 8
machdep.cpu.brand_string: Apple M2

@mango19970707
Copy link

还需要其他测试嘛,举手之劳,不用客气。能为这么好用的库做贡献,十分荣幸

@emmansun
Copy link
Owner

还需要其他测试嘛,举手之劳,不用客气。能为这么好用的库做贡献,十分荣幸

多谢!感觉和X86下SM4和AES的差距大得多。不知道cpu.ARM64.HasAES cpu.ARM64.HasPMULL 这两个CPU 特性是否都为TRUE。

比如,可并行的解密与不可并行的加密差距不到:

BenchmarkSM4CBCEncrypt1K
BenchmarkSM4CBCEncrypt1K-8       	  123598	      9383 ns/op	 109.14 MB/s
BenchmarkSM4CBCDecrypt1K
BenchmarkSM4CBCDecrypt1K-8       	  146947	      8084 ns/op	 126.68 MB/s

@mango19970707
Copy link

sysctl -n machdep.cpu.brand_string
Apple M2

Apple M2芯片基于ARM架构,并且支持ARM64指令集。根据ARM64架构的特性,Apple M2芯片应该支持AES和PMULL指令集

@mango19970707
Copy link

kern.sched_rt_avoid_cpu0: 0
kern.cpu_checkin_interval: 4000
hw.ncpu: 8
hw.activecpu: 8
hw.perflevel0.physicalcpu: 4
hw.perflevel0.physicalcpu_max: 4
hw.perflevel0.logicalcpu: 4
hw.perflevel0.logicalcpu_max: 4
hw.perflevel0.cpusperl2: 4
hw.perflevel1.physicalcpu: 4
hw.perflevel1.physicalcpu_max: 4
hw.perflevel1.logicalcpu: 4
hw.perflevel1.logicalcpu_max: 4
hw.perflevel1.cpusperl2: 4
hw.physicalcpu: 8
hw.physicalcpu_max: 8
hw.logicalcpu: 8
hw.logicalcpu_max: 8
hw.cputype: 16777228
hw.cpusubtype: 2
hw.cpu64bit_capable: 1
hw.cpufamily: -634136515
hw.cpusubfamily: 2
machdep.cpu.cores_per_package: 8
machdep.cpu.core_count: 8
machdep.cpu.logical_per_package: 8
machdep.cpu.thread_count: 8
machdep.cpu.brand_string: Apple M2

@emmansun
Copy link
Owner

emmansun commented Jan 23, 2024

明天写个测试程序,确认一下是否是x/cpu的问题。golang/go#43046

package main

import (
	"fmt"

	"golang.org/x/sys/cpu"
)

func main() {
	fmt.Printf("HasAES=%v, HasPMULL=%v\n", cpu.ARM64.HasAES, cpu.ARM64.HasPMULL)
}

@mango19970707
Copy link

确实是这个问题

HasAES=false, HasPMULL=false

@emmansun
Copy link
Owner

@mango19970707 ,方便的话,帮忙用最新的代码跑一下性能测试(arm64/darwin),谢谢!

@mango19970707
Copy link

goos: darwin
goarch: arm64
pkg: github.com/emmansun/gmsm/cipher
BenchmarkSM4BCEncrypt1K
BenchmarkSM4BCEncrypt1K-8        	   45248	     23536 ns/op	  43.51 MB/s
BenchmarkSM4BCDecrypt1K
BenchmarkSM4BCDecrypt1K-8        	   52416	     22513 ns/op	  45.48 MB/s
BenchmarkSM4HCTREncrypt1K
BenchmarkSM4HCTREncrypt1K-8      	   82255	     14668 ns/op	  69.81 MB/s
BenchmarkSM4ECBEncrypt1K
BenchmarkSM4ECBEncrypt1K-8       	  439059	      2807 ns/op	 364.81 MB/s
BenchmarkAES128ECBEncrypt1K
BenchmarkAES128ECBEncrypt1K-8    	 3086666	       386.2 ns/op	2651.69 MB/s
BenchmarkAESCBCEncrypt1K
BenchmarkAESCBCEncrypt1K-8       	 1390239	       867.6 ns/op	1180.30 MB/s
BenchmarkSM4CBCEncrypt1K
BenchmarkSM4CBCEncrypt1K-8       	   52814	     22590 ns/op	  45.33 MB/s
BenchmarkAESCBCDecrypt1K
BenchmarkAESCBCDecrypt1K-8       	 2070918	       577.4 ns/op	1773.52 MB/s
BenchmarkSM4CBCDecrypt1K
BenchmarkSM4CBCDecrypt1K-8       	  389868	      3007 ns/op	 340.57 MB/s
BenchmarkAESCFBEncrypt1K
BenchmarkAESCFBEncrypt1K-8       	 1000000	      1001 ns/op	1018.43 MB/s
BenchmarkSM4CFBEncrypt1K
BenchmarkSM4CFBEncrypt1K-8       	   51726	     22945 ns/op	  44.41 MB/s
BenchmarkAESCFBDecrypt1K
BenchmarkAESCFBDecrypt1K-8       	 1537636	       779.8 ns/op	1306.83 MB/s
BenchmarkSM4CFBDecrypt1K
BenchmarkSM4CFBDecrypt1K-8       	   53816	     22399 ns/op	  45.49 MB/s
BenchmarkAESCFBDecrypt8K
BenchmarkAESCFBDecrypt8K-8       	  196856	      6113 ns/op	1339.20 MB/s
BenchmarkSM4CFBDecrypt8K
BenchmarkSM4CFBDecrypt8K-8       	    6692	    178963 ns/op	  45.75 MB/s
BenchmarkAESOFB1K
BenchmarkAESOFB1K-8              	 1531342	       774.6 ns/op	1315.58 MB/s
BenchmarkSM4OFB1K
BenchmarkSM4OFB1K-8              	   53062	     22620 ns/op	  45.05 MB/s
BenchmarkAESCTR1K
BenchmarkAESCTR1K-8              	 2544462	       469.1 ns/op	2172.29 MB/s
BenchmarkSM4CTR1K
BenchmarkSM4CTR1K-8              	  230947	      5109 ns/op	 199.46 MB/s
BenchmarkAESCTR8K
BenchmarkAESCTR8K-8              	  331701	      3611 ns/op	2267.25 MB/s
BenchmarkSM4CTR8K
BenchmarkSM4CTR8K-8              	   29235	     41004 ns/op	 199.66 MB/s
BenchmarkAESGCMSeal1K
BenchmarkAESGCMSeal1K-8          	 7863057	       152.2 ns/op	6726.03 MB/s
BenchmarkSM4GCMSeal1K
BenchmarkSM4GCMSeal1K-8          	  366088	      3215 ns/op	 318.55 MB/s
BenchmarkAESGCMOpen1K
BenchmarkAESGCMOpen1K-8          	 8072829	       148.9 ns/op	6878.13 MB/s
BenchmarkSM4GCMOpen1K
BenchmarkSM4GCMOpen1K-8          	  374250	      3161 ns/op	 323.95 MB/s
BenchmarkAESGCMSign1K
BenchmarkAESGCMSign1K-8          	13024006	        96.20 ns/op	10644.30 MB/s
BenchmarkSM4GCMSign1K
BenchmarkSM4GCMSign1K-8          	 2842420	       406.0 ns/op	2521.91 MB/s
BenchmarkAESGCMSign8K
BenchmarkAESGCMSign8K-8          	 1445786	       822.9 ns/op	9955.40 MB/s
BenchmarkSM4GCMSign8K
BenchmarkSM4GCMSign8K-8          	 1000000	      1137 ns/op	7201.97 MB/s
BenchmarkAESGCMSeal8K
BenchmarkAESGCMSeal8K-8          	 1000000	      1118 ns/op	7326.20 MB/s
BenchmarkSM4GCMSeal8K
BenchmarkSM4GCMSeal8K-8          	   50091	     23516 ns/op	 348.35 MB/s
BenchmarkAESGCMOpen8K
BenchmarkAESGCMOpen8K-8          	 1000000	      1009 ns/op	8120.31 MB/s
BenchmarkSM4GCMOpen8K
BenchmarkSM4GCMOpen8K-8          	   52142	     23156 ns/op	 353.77 MB/s
BenchmarkAESCCMSign1K
BenchmarkAESCCMSign1K-8          	 1000000	      1083 ns/op	 945.26 MB/s
BenchmarkSM4CCMSign1K
BenchmarkSM4CCMSign1K-8          	   47324	     24246 ns/op	  42.23 MB/s
BenchmarkAESCCMSeal1K
BenchmarkAESCCMSeal1K-8          	  782587	      1511 ns/op	 677.73 MB/s
BenchmarkSM4CCMSeal1K
BenchmarkSM4CCMSeal1K-8          	   40938	     29614 ns/op	  34.58 MB/s
BenchmarkAESCCMOpen1K
BenchmarkAESCCMOpen1K-8          	  789286	      1518 ns/op	 674.47 MB/s
BenchmarkSM4CCMOpen1K
BenchmarkSM4CCMOpen1K-8          	   39928	     29360 ns/op	  34.88 MB/s
BenchmarkAESCCMSign8K
BenchmarkAESCCMSign8K-8          	  168481	      7155 ns/op	1144.94 MB/s
BenchmarkSM4CCMSign8K
BenchmarkSM4CCMSign8K-8          	    6213	    184557 ns/op	  44.39 MB/s
BenchmarkAESCCMSeal8K
BenchmarkAESCCMSeal8K-8          	  109142	     10907 ns/op	 751.07 MB/s
BenchmarkSM4CCMSeal8K
BenchmarkSM4CCMSeal8K-8          	    5300	    226061 ns/op	  36.24 MB/s
BenchmarkAESCCMOpen8K
BenchmarkAESCCMOpen8K-8          	  111435	     10722 ns/op	 764.02 MB/s
BenchmarkSM4CCMOpen8K
BenchmarkSM4CCMOpen8K-8          	    5184	    225498 ns/op	  36.33 MB/s
BenchmarkAES128XTSEncrypt512
BenchmarkAES128XTSEncrypt512-8   	 2750562	       430.4 ns/op	1189.56 MB/s
BenchmarkAES128XTSEncrypt1K
BenchmarkAES128XTSEncrypt1K-8    	 1378828	       876.6 ns/op	1168.20 MB/s
BenchmarkAES128XTSEncrypt4K
BenchmarkAES128XTSEncrypt4K-8    	  339626	      3461 ns/op	1183.47 MB/s
BenchmarkAES256XTSEncrypt512
BenchmarkAES256XTSEncrypt512-8   	 2520739	       477.5 ns/op	1072.28 MB/s
BenchmarkAES256XTSEncrypt1K
BenchmarkAES256XTSEncrypt1K-8    	 1252051	       968.9 ns/op	1056.89 MB/s
BenchmarkAES256XTSEncrypt4K
BenchmarkAES256XTSEncrypt4K-8    	  310426	      3815 ns/op	1073.80 MB/s
BenchmarkSM4XTSEncrypt512
BenchmarkSM4XTSEncrypt512-8      	  801194	      1480 ns/op	 346.02 MB/s
BenchmarkSM4XTSEncrypt1K
BenchmarkSM4XTSEncrypt1K-8       	  405517	      2912 ns/op	 351.67 MB/s
BenchmarkSM4XTSEncrypt4K
BenchmarkSM4XTSEncrypt4K-8       	  102631	     11740 ns/op	 348.89 MB/s
BenchmarkSM4XTSEncrypt512_GB
BenchmarkSM4XTSEncrypt512_GB-8   	  772028	      1528 ns/op	 335.14 MB/s
BenchmarkSM4XTSEncrypt1K_GB
BenchmarkSM4XTSEncrypt1K_GB-8    	  395422	      3052 ns/op	 335.48 MB/s
BenchmarkSM4XTSEncrypt4K_GB
BenchmarkSM4XTSEncrypt4K_GB-8    	   98650	     12261 ns/op	 334.08 MB/s
BenchmarkAES128XTSDecrypt512
BenchmarkAES128XTSDecrypt512-8   	 2592085	       461.8 ns/op	1108.75 MB/s
BenchmarkAES128XTSDecrypt1K
BenchmarkAES128XTSDecrypt1K-8    	 1304486	       922.9 ns/op	1109.59 MB/s
BenchmarkAES128XTSDecrypt4K
BenchmarkAES128XTSDecrypt4K-8    	  331948	      3630 ns/op	1128.34 MB/s
BenchmarkAES256XTSDecrypt512
BenchmarkAES256XTSDecrypt512-8   	 2305957	       523.8 ns/op	 977.52 MB/s
BenchmarkAES256XTSDecrypt1K
BenchmarkAES256XTSDecrypt1K-8    	 1000000	      1037 ns/op	 987.87 MB/s
BenchmarkAES256XTSDecrypt4K
BenchmarkAES256XTSDecrypt4K-8    	  293144	      4106 ns/op	 997.62 MB/s
BenchmarkSM4XTSDecrypt512
BenchmarkSM4XTSDecrypt512-8      	  798985	      1474 ns/op	 347.33 MB/s
BenchmarkSM4XTSDecrypt1K
BenchmarkSM4XTSDecrypt1K-8       	  411249	      2912 ns/op	 351.68 MB/s
BenchmarkSM4XTSDecrypt4K
BenchmarkSM4XTSDecrypt4K-8       	  102864	     11852 ns/op	 345.60 MB/s
BenchmarkSM4XTSDecrypt512_GB
BenchmarkSM4XTSDecrypt512_GB-8   	  793485	      1526 ns/op	 335.61 MB/s
BenchmarkSM4XTSDecrypt1K_GB
BenchmarkSM4XTSDecrypt1K_GB-8    	  396297	      3026 ns/op	 338.36 MB/s
BenchmarkSM4XTSDecrypt4K_GB
BenchmarkSM4XTSDecrypt4K_GB-8    	   98881	     12132 ns/op	 337.61 MB/s
PASS

@emmansun
Copy link
Owner

谢谢! #172 我在考虑是否要用安全性换性能。

@mango19970707
Copy link

可以都支持,根据实际情况来选

@emmansun
Copy link
Owner

Thx all! close it first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants