using conv1x1s1_sgemm_neon condition #598

maxfy1992 · 2018-09-30T02:38:17Z

//是否一定需要满足下列条件才可以使用，还是都可以使用，基于其他什么考量
if (num_input >= 64 && num_output >= 64)
use_sgemm1x1 = true;

xindongzhang · 2018-10-05T07:27:20Z

@yangfengmax 按照我个人的理解，这个是两种卷积实现方式的效率问题呢，不同维度的输入数据会有不同的bottleneck；第一种naive的方法就是从output channels->input channels -> height-> width 这样递归去实现卷积，如果output channels和input channels比较大的话，会导致内循环太多；第二种，是caffe里面的实现方式，将input-data的每个patch抠出来叠成一个matrix（有性能开销），同时kernel也拉伸叠成matrix（有性能开销），将多层循环转化成matrix production。第一种卷积的开销会出现在：copy-and-make-border + 卷积浮点运算，第二种卷积的话：copy-and-make-border + create and reshape to 2D-matrix + matrix production，你可以自己在pc上实现一下这两种卷积，造几个数据试试看看，我也是试出来的。

xindongzhang · 2018-10-05T07:52:31Z

@yangfengmax 另外，如果感兴趣的话，可以看看这篇文章。Fast Algorithms for Convolutional Neural Networks

maxfy1992 · 2018-10-11T06:05:37Z

@yangfengmax 另外，如果感兴趣的话，可以看看这篇文章。Fast Algorithms for Convolutional Neural Networks

多谢多谢

maxfy1992 closed this as completed Oct 11, 2018

zhu-zhaofei mentioned this issue Dec 19, 2021

PNNX is an open standard for PyTorch model interoperability #3262

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

using conv1x1s1_sgemm_neon condition #598

using conv1x1s1_sgemm_neon condition #598

maxfy1992 commented Sep 30, 2018

xindongzhang commented Oct 5, 2018

xindongzhang commented Oct 5, 2018

maxfy1992 commented Oct 11, 2018

using conv1x1s1_sgemm_neon condition #598

using conv1x1s1_sgemm_neon condition #598

Comments

maxfy1992 commented Sep 30, 2018

xindongzhang commented Oct 5, 2018

xindongzhang commented Oct 5, 2018

maxfy1992 commented Oct 11, 2018