-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[MXNET-497] fix bugs in MKLDNN operators to handle the kAddTo request #11129
Conversation
i don't think all operators support kAddTo |
71b0db4
to
50befd6
Compare
src/common/exec_utils.h
Outdated
CHECK(temp.IsDefaultData()); | ||
#else | ||
NDArray temp = bufs != nullptr ? bufs->at(i) : NDArray(nd.shape(), nd.ctx(), | ||
true, nd.dtype()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sparse arrays doesn't have kAddTo? @eric-haibin-lin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The executor won't generate kAddTo for sparse outputs. Sparse operators don't support that
CHECK(mem != nullptr); | ||
if (mem == nullptr) { | ||
auto tmp_memory = TmpMemMgr::Get()->Alloc(target_pd); | ||
CopyMKLDNNMem(*res_memory, tmp_memory); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can just call GetMKLDNNDataReorder.
@pengzhao-intel tries to fix this bug, but doesn't have a test for his fix.
#11095
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the purpose of this block is coerce the input memory (res
) to the same size as the output ndarray's memory. the issue here is that we do not have the ndarray of the input, only the memory in res.second
so we don't have the GetMKLDNNDataReorder
member function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. When there is a format mismatch between arr
and res
, you use the format of arr
. In this case, you may remove line 214 and 216 and have the sum written to mem
directly. WriteInplace works in MKLDNN as long as the inputs and outputs have the same shape and format.
const_cast<NDArray &>(output).InvalidateMKLDNNData(); | ||
else if (req[i] == kAddTo) | ||
output = outputs[i].Reorder2Default(); | ||
} else if (req[0] == kAddTo && output.IsMKLDNNData()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it's necessary that in the case of fallback, the output is still an MKLDNN array?
I know the unit test can trigger the problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the output is not MKLDNN then we do not need to enter this block because we will able to call the FCompute function on the output directly.
else if (req[i] == kAddTo) | ||
output = outputs[i].Reorder2Default(); | ||
} else if (req[0] == kAddTo && output.IsMKLDNNData()) { | ||
NDArray temp = outputs[0].Reorder2Default(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be outputs[i]
fn(attrs, ctx, in_blobs, req, out_blobs); | ||
if (req[0] == kAddTo && outputs[0].IsMKLDNNData()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is the index always 0?
src/operator/nn/mkldnn/mkldnn_sum.cc
Outdated
if (in_mem == nullptr) { | ||
auto tmp_memory = TmpMemMgr::Get()->Alloc(target_pd); | ||
auto input_memory = inputs[i].GetMKLDNNData(); | ||
CopyMKLDNNMem(*input_memory, tmp_memory); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can just call GetMKLDNNDataReorder
tests/cpp/operator/mkldnn.cc
Outdated
EXPECT_EQ(d1[i], std::fmax(d2[i], 0)); | ||
} | ||
for (size_t i = 0; i < tmp1.shape().Size(); i++) | ||
ASSERT_EQ(d2[i], std::fmax(d1[i], 0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we use google test check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ASSERT_EQ is from gtest? I am using ASSERT_EQ over EXPECT_EQ here cause I want the test to fail as soon as one comparison fails (else we get an error message from every incorrect element in the vector).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ASSERT_EQ is defined by mxnet. We should use EXPECT_EQ in the unit test because we want to see all failures.
tests/cpp/operator/mkldnn.cc
Outdated
mshadow::default_real_t *d1 = in1.data().dptr<mshadow::default_real_t>(); | ||
mshadow::default_real_t *d2 = in2.data().dptr<mshadow::default_real_t>(); | ||
mshadow::default_real_t *o = out.data().dptr<mshadow::default_real_t>(); | ||
for (size_t i = 0; i < in1.shape().Size(); i++) | ||
EXPECT_EQ(d1[i] + d2[i], o[i]); | ||
ASSERT_EQ(d1[i] + d2[i], o[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the same here?
@azai91 Really thanks your efforts for improving the quality of MKL-DNN backend. @azai91 @zheng-da @eric-haibin-lin @marcoabreu Two suggestions: 1) avoiding change the MKL-DNN implementations when adding the test cases. One example is #11026, this PR is marked as adding test cases, so we don't pay much attention because it's always a good thing for more cases. But the implementation was changed and led to a performance regression for resnet-50 inference (from 238 drop to 179), see below. 2) running the
|
okay. will add MKLDNN benchmark tests in a separate PR |
@pengzhao-intel is anyone working on a PR to revert the regression or is someone already reverting? |
@azai91, You can try to figure out the root cause and fix it with a new PR. I don't think we need to revert the previous PR. |
@pengzhao-intel I think we should fix bugs when adding tests. We can change the title of this PR to something like "fix bugs in MKLDNN operators to handle the kAddTo request". |
src/common/exec_utils.h
Outdated
#if MXNET_USE_MKLDNN == 1 | ||
NDArray temp = bufs != nullptr ? bufs->at(i) : nd.IsMKLDNNData() ? | ||
nd.Reorder2Default() : NDArray(nd.shape(), nd.ctx(), true, nd.dtype()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when MXNET_USE_MKLDNN == 1
, isn't is_default
the same as nd.IsMKLDNNData()
?
why do you still need to check nd.IsMKLDNNData()
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_default could be false because it is a sparse ndarray. in which case we cannot call reorder2default and will just make a copy like we previously did. this does mean that kAddTo won't work as we are not preserving the data to tmp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the logic here looks very complex. i guess you might need to check IsMKLDNNData
first before checking bufs
.
Also, what happens if nd is a default dense array and req is kAddTo? should we use nd directly? Should you check kAddTo more explicitly?
CHECK(mem != nullptr); | ||
if (mem == nullptr) { | ||
auto tmp_memory = TmpMemMgr::Get()->Alloc(target_pd); | ||
CopyMKLDNNMem(*res_memory, tmp_memory); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. When there is a format mismatch between arr
and res
, you use the format of arr
. In this case, you may remove line 214 and 216 and have the sum written to mem
directly. WriteInplace works in MKLDNN as long as the inputs and outputs have the same shape and format.
Address regression issue in this diff #11262 |
@zheng-da agree. It's nice to keep consistency between the title/description with the real changes. |
working on breaking this diff into smaller diffs. will postpone trying to merge this for a couple of days. |
@azai91 I have closed my PR since you have already fixed it and added the tests, thanks a lot. |
8b909d5
to
c055940
Compare
@zheng-da @pengzhao-intel updated PR. please take a look when you have time. |
if (mem == nullptr) { | ||
auto tmp_memory = TmpMemMgr::Get()->Alloc(target_pd); | ||
MKLDNNCopy(*res_memory, tmp_memory); | ||
res_memory = tmp_memory; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As my understanding, MKLDNNCopy
already reordered the res_memory
with tmp_memory
, so why we need to assign tmp_memory
to res_memory
again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in line 224 we use res_memory and add it with mem.
@pengzhao-intel does my answer above address the issue? |
CHECK(temp.IsDefaultData()); | ||
#else | ||
NDArray temp = bufs != nullptr ? bufs->at(i) : NDArray(nd.shape(), nd.ctx(), | ||
true, nd.dtype()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indent.
@@ -412,6 +422,8 @@ OpAttrs GetReluBackwardsOp() { | |||
attrs.dispatches.resize(2); | |||
attrs.dispatches[0] = DispatchMode::kFCompute; | |||
attrs.dispatches[1] = DispatchMode::kFComputeEx; | |||
attrs.requests.insert(OpReqType::kWriteTo); | |||
attrs.requests.insert(OpReqType::kWriteInplace); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why there isn't a kAdd test here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
…apache#11129) * fix lint * requests added to opattr * comment out addto * can invalidate kAddTo request mkldarrays * revert adding kAddTo to invalidate * use copy of output instead of creating new array * convert output to default if fallback * do not make copy when init * copyex fallback copies to old array with kAddTo * change input mem desc to output mem desc if not equal * reorder memory in commitoutput * allocate temp memory * fix var names * create helper reorder function to handle diff format/shapes * fix typos * fix typos * remove unused code * fix param * fix header files * force input memory to output * reorder2default keeps pointer to mkldnn memory * pass reference * remove extra lines * do not get raw mem from ptr * remove isView check * fallback writes back to output * remove redundant line * remove commented out code * use fallback in copy (refactor) * remove unused header * fix lint * reorder2default only if mkldnn flag * only reorder if mkldnn * does not assume 1 output * sum compares input and output shape * compare address and pd in sum * refactor mkldnnsum * fix const param * fix header * improve control flow when setting output blob * fix merge * remove kaddto comment * add reqests to operators * fix spacing * do sum in place * fix conditionals * remove redundant reqs * use wait to read all * fix lint * create multiple outputs * create multiple copies for kaddto * retrigger * retriggrer * retrigger * retrigger * another retrigger * retrigger * retrigger * another another retrigger * fix merge * retrigger * add kAddto to relu op * retrigger
Description
(Brief description on what this PR is about)
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments