Initial the int8 quantize inference implement #487

BUG1989 · 2018-07-20T04:14:13Z

ncnn-int8 pull request readme

Int8 inference support layer

platform	layer
x86	conv1x1s1、conv1x1s2、conv3x3s1、conv3x3s2、dwconv3x3s1、dwconv3x3s2
armv7a	conv1x1s1

Summary change files

Function	change or add files
example of int8	examples/CmakeLists.txt
	examples/squeezenet-int8.cpp
	examples/squeezenet_v1.1.table

parse int8 calibration table	src/net.h
	src/net.cpp
	src/layer.h
	src/layer.cpp
	src/quantize.h

int8 implement of x86	src/layer/convolution_quantize.h
	src/layer/convolution.h
	src/layer/convolution.cpp
	src/layer/x86/convolution_x86.cpp
	src/layer/convolutiondepthwise.h
	src/layer/convolutiondepthwise.cpp
	src/layer/x86/convolutiondepthwise_x86.cpp

int8 & fusion-relu implement of armv7a	src/layer/arm/convolution_quantize_arm.h
	src/layer/arm/convolution_arm.cpp
	src/layer/arm/convolution_1x1_int8.h

adapt to int8 framework	src/mat.h
	src/layer/concat.cpp
	src/layer/eltwise.cpp
	src/layer/arm/eltwise_arm.cpp
	src/layer/pooling.cpp
	src/layer/arm/pooling_arm.cpp
	src/layer/arm/innerproduct_arm.cpp

tools	tools/calibration2mem.cpp

compile file	src/CMakeLists.txt
	examples/CMakeLists.txt
	tools/CMakeLists.txt

How to generate the Calibration table file

We provide a tool for generating the Int8 calibration table file:

caffe-int8-convert-tools

How to use Int8 inference

In the default set, the inference using the Float32 mode,If you want switch the inference to Int8 mode,just need add 2 lines code,more details please see the examples/squeezenet-int8.cpp file.

......
ncnn::Net squeezenet;
squeezenet.set_conv_model(CONV_INT8);               //set the Int8 mode
squeezenet.load_param("squeezenet_v1.1.param");
squeezenet.load_scale("squeezenet_v1.1.table");     //parse the Int8 calibration table,also it's the quantize scale value
squeezenet.load_model("squeezenet_v1.1.bin");
......

Attention above codes,the sequence of calling API must be the same !!!

How to manual control the int8 conv layer on/off

This implement is naive and ugly,need modify the ncnn.param file by manual.

Editor the ncnn.param,add the paramter in the conv layer "7=1"or in the deconv layer "8=1",the default value is "0" ,the int8 conv default enable.

Convolution      conv1            1 1 data conv1 0=8 1=3 2=1 3=2 4=1 5=1 6=216 7=1
ConvolutionDepthWise conv2_depthwise  1 1 conv1 conv2_depthwise 0=8 1=3 2=1 3=2 4=1 5=1 6=72 7=8 8=1
Convolution      conv2_pointwise  1 1 conv2_depthwise conv2_pointwise 0=16 1=1 2=1 3=1 4=0 5=1 6=128 7=1
ConvolutionDepthWise conv3_depthwise  1 1 conv2_pointwise conv3_depthwise 0=16 1=3 2=1 3=1 4=1 5=1 6=144 7=16 8=1

Result Accuracy

Type	Note
Dataset of Calibration	ILSVRC2012_img_test 1k
Dataset of Test	ILSVRC2012_img_val 5k
Framework	ncnn-int8
Support Layer	conv3x3、conv1x1、convdw3x3

	FP32		INT8

NETWORK	Top1	Top5	Top1	Top5
SqueezeNet v1.1	57.86%	79.86%	57.36%	79.84%
MobileNet v1	67.78%	87.62%	64.92%	85.22%
MobileNet v2	70.20%	89.20%	68.94%	87.90%
GoogleNet v1	67.70%	88.32%	67.64%	88.26%
ResNet-18	65.50%	86.46%	65.48%	86.44%
ResNet-50	71.68%	89.94%	71.38%	89.52%
NETWORK	Top1	Top5	Diff Top1	Diff Top5
SqueezeNet v1.1	57.86%	79.86%	0.50%	0.02%
MobileNet v1	67.78%	87.62%	2.86%	2.40%
MobileNet v2	70.20%	89.20%	1.26%	1.30%
GoogleNet v1	67.70%	88.32%	0.06%	0.06%
ResNet-18	65.50%	86.46%	0.02%	0.02%
ResNet-50	71.68%	89.94%	0.30%	0.32%

Result Performance

Type	Note
Hareware	Hisi3519([email protected])
Test App	ncnn-int8 benchncnn
Unit	ms

Models	Float32(winograd on)	Int8	Ratio
SqueezeNet_v1.1	315	242	x1.30
MobileNet_v1	523	369	x1.41
MobileNet_v2	401	311	x1.28
MobileNet_v1-SSD	1034	701	x1.47

TODO

Support conv3x3s2 int8 in arm
Support conv1x1s2 int8 in arm
Support convdw3x3 int8 in arm
Support x86 platform fusion ReLU
Support aarch64 platform Int8 and fusion ReLU
Optimize the streamline
Add the result accuracy of detection task

Thanks

The original author of the int8 code : fu1899

The original author of the algorithm code : JansonZhu

nihui · 2018-08-01T08:21:13Z

framework int8 a169cec

x86 int8 4be27a0

armv7 int8 e34aa77

BUG1989 · 2019-02-21T02:37:43Z

new int8 implement is work in process,better accuracy and speed.
new int8 pr

BUG1989 added 3 commits July 20, 2018 11:36

initial the int8 quantize inference implement

50521c7

del the ssdmobilenet binary file,it's too huge

4c376e8

clear some ugly change

daea9ca

BUG1989 mentioned this pull request Jul 21, 2018

Does it Plan to open ncnn-int8 source? BUG1989/caffe-int8-convert-tools#1

Closed

nihui mentioned this pull request Jul 26, 2018

quantized int8 storage and operation #230

Closed

nihui closed this Aug 1, 2018

zhu-zhaofei mentioned this pull request Dec 19, 2021

PNNX is an open standard for PyTorch model interoperability #3262

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial the int8 quantize inference implement #487

Initial the int8 quantize inference implement #487

BUG1989 commented Jul 20, 2018 •

edited

Loading

nihui commented Aug 1, 2018

BUG1989 commented Feb 21, 2019

Initial the int8 quantize inference implement #487

Initial the int8 quantize inference implement #487

Conversation

BUG1989 commented Jul 20, 2018 • edited Loading

ncnn-int8 pull request readme

Int8 inference support layer

Summary change files

How to generate the Calibration table file

How to use Int8 inference

How to manual control the int8 conv layer on/off

Result Accuracy

Result Performance

TODO

Thanks

nihui commented Aug 1, 2018

BUG1989 commented Feb 21, 2019

BUG1989 commented Jul 20, 2018 •

edited

Loading