We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
//是否一定需要满足下列条件才可以使用,还是都可以使用,基于其他什么考量 if (num_input >= 64 && num_output >= 64) use_sgemm1x1 = true;
The text was updated successfully, but these errors were encountered:
@yangfengmax 按照我个人的理解,这个是两种卷积实现方式的效率问题呢,不同维度的输入数据会有不同的bottleneck;第一种naive的方法就是从output channels->input channels -> height-> width 这样递归去实现卷积,如果output channels和input channels比较大的话,会导致内循环太多;第二种,是caffe里面的实现方式,将input-data的每个patch抠出来叠成一个matrix(有性能开销),同时kernel也拉伸叠成matrix(有性能开销),将多层循环转化成matrix production。第一种卷积的开销会出现在:copy-and-make-border + 卷积浮点运算,第二种卷积的话:copy-and-make-border + create and reshape to 2D-matrix + matrix production,你可以自己在pc上实现一下这两种卷积,造几个数据试试看看,我也是试出来的。
Sorry, something went wrong.
@yangfengmax 另外,如果感兴趣的话,可以看看这篇文章。Fast Algorithms for Convolutional Neural Networks
多谢多谢
No branches or pull requests
//是否一定需要满足下列条件才可以使用,还是都可以使用,基于其他什么考量
if (num_input >= 64 && num_output >= 64)
use_sgemm1x1 = true;
The text was updated successfully, but these errors were encountered: