Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using conv1x1s1_sgemm_neon condition #598

Closed
maxfy1992 opened this issue Sep 30, 2018 · 3 comments
Closed

using conv1x1s1_sgemm_neon condition #598

maxfy1992 opened this issue Sep 30, 2018 · 3 comments

Comments

@maxfy1992
Copy link
Contributor

//是否一定需要满足下列条件才可以使用,还是都可以使用,基于其他什么考量
if (num_input >= 64 && num_output >= 64)
use_sgemm1x1 = true;

@xindongzhang
Copy link

@yangfengmax 按照我个人的理解,这个是两种卷积实现方式的效率问题呢,不同维度的输入数据会有不同的bottleneck;第一种naive的方法就是从output channels->input channels -> height-> width 这样递归去实现卷积,如果output channels和input channels比较大的话,会导致内循环太多;第二种,是caffe里面的实现方式,将input-data的每个patch抠出来叠成一个matrix(有性能开销),同时kernel也拉伸叠成matrix(有性能开销),将多层循环转化成matrix production。第一种卷积的开销会出现在:copy-and-make-border + 卷积浮点运算,第二种卷积的话:copy-and-make-border + create and reshape to 2D-matrix + matrix production,你可以自己在pc上实现一下这两种卷积,造几个数据试试看看,我也是试出来的。

@xindongzhang
Copy link

@yangfengmax 另外,如果感兴趣的话,可以看看这篇文章。Fast Algorithms for Convolutional Neural Networks

@maxfy1992
Copy link
Contributor Author

@yangfengmax 另外,如果感兴趣的话,可以看看这篇文章。Fast Algorithms for Convolutional Neural Networks

多谢多谢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants