Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MISC] Distill-DSM Model #630

Draft
wants to merge 2 commits into
base: misc
Choose a base branch
from

Conversation

Rakshith2597
Copy link

Submitting training module for Distill DSM: A computationally efficient method for segmentation of medical imaging volumes.

Paper: MIDL 2021
Dataset used for this code repo: Medical decathalon

This is part of the project MIRIAD: Many Incarnations of Screening of Radiology for High Throughput Disease Screening via Multiple Instance Reinforcement Learning with Adversarial Deep Neural Networks, sponsored by INTEL TECHNOLOGY INDIA PVT. LTD.

Principal Investigators:
Dr Debdoot Sheet (PI), Dr Nirmalya Ghosh (Co-PI)
Department of Electrical Engineering
Indian Institute of Technology Kharagpur

Dr Ramanathan Sethuraman (Co-PI)
Intel Technology India Pvt. Ltd.

@github-actions github-actions bot added the DEPENDENCY Any changes in any dependencies (new dep or its version) should be produced via Change Request on PM label Oct 10, 2021
@Rakshith2597
Copy link
Author

Conversion of the model from ONNX to OpenVINO IR fails with the following error.

E subprocess.CalledProcessError: Command 'mo --framework onnx --input_model model_weights/distill_dsm.onnx --input_shape "[2, 1, 128, 160, 160]" --log_level DEBUG' returned non-zero exit status 127.

What could be the possible reasons for this?

@morkovka1337
Copy link
Contributor

Conversion of the model from ONNX to OpenVINO IR fails with the following error.

E subprocess.CalledProcessError: Command 'mo --framework onnx --input_model model_weights/distill_dsm.onnx --input_shape "[2, 1, 128, 160, 160]" --log_level DEBUG' returned non-zero exit status 127.

What could be the possible reasons for this?

Are there any additional details available? Could you, please, provide the full log from the model conversion command?
You can try to convert the model in the terminal as a separate command (not as part of subprocess.call).

@Rakshith2597
Copy link
Author

Rakshith2597 commented Oct 12, 2021

Are there any additional details available? Could you, please, provide the full log from the model conversion command? You can try to convert the model in the terminal as a separate command (not as part of subprocess.call).

Here is the full log.

Model Optimizer arguments:
Common parameters:
- Path to the Input Model: /home/rakshith/bmi7/training_extensions/misc/pytorch_toolkit/distilldsm/model_weights/distill_dsm.onnx
- Path for generated IR: /home/rakshith/bmi7/training_extensions/misc/pytorch_toolkit/distilldsm/.
- IR output name: distill_dsm
- Log level: ERROR
- Batch: Not specified, inherited from the model
- Input layers: Not specified, inherited from the model
- Output layers: Not specified, inherited from the model
- Input shapes: [2,1,128,160,160]
- Mean values: Not specified
- Scale values: Not specified
- Scale factor: Not specified
- Precision of IR: FP32
- Enable fusing: True
- Enable grouped convolutions fusing: True
- Move mean values to preprocess section: None
- Reverse input channels: False
ONNX specific parameters:
- Inference Engine found in: /home/rakshith/bmi7/training_extensions/misc/pytorch_toolkit/distilldsm/venv/lib/python3.6/site-packages/openvino
Inference Engine version: 2021.4.1-3926-14e67d86634-releases/2021/4
Model Optimizer version: 2021.4.1-3926-14e67d86634-releases/2021/4
[ ERROR ] Cannot infer shapes or values for node "Slice_49".
[ ERROR ] Output shape: [256 0 80 80] of node "Slice_49" contains non-positive values
[ ERROR ]
[ ERROR ] It can happen due to bug in custom shape infer function <function Slice.infer at 0x7f4814f3ff28>.
[ ERROR ] Or because the node inputs have incorrect values/shapes.
[ ERROR ] Or because input shapes are incorrect (embedded to the model or passed via --input_shape).
[ ERROR ] Run Model Optimizer with --log_level=DEBUG for more information.
[ ERROR ] Exception occurred during running replacer "REPLACEMENT_ID" (<class 'extensions.middle.PartialInfer.PartialInfer'>): Stopped shape/value propagation at "Slice_49" node.
For more information please refer to Model Optimizer FAQ, question #38. (https://docs.openvinotoolkit.org/latest/openvino_docs_MO_DG_prepare_model_Model_Optimizer_FAQ.html?question=38#question-38)

The input shape is the same as the one I had used to train the .pth model and also to convert it into ONNX.

@morkovka1337
Copy link
Contributor

morkovka1337 commented Oct 12, 2021

You can try two options:

  1. Check that ONNX model works correct on some small dataset. Sometimes it can be converted without errors, but cannot be inferred properly.
  2. Sorry, the second option is not this case, never mind.

@morkovka1337
Copy link
Contributor

Also, you can try to move return statement in the model forward function to distinguish which concrete operation produces this error.

@Rakshith2597
Copy link
Author

  1. Check that ONNX model works correct on some small dataset. Sometimes it can be converted without errors, but cannot be inferred properly.

You are correct. The model got converted without errors but is unable to infer.

@Ilya-Krylov Ilya-Krylov changed the base branch from develop to master November 2, 2021 08:23
@druzhkov-paul druzhkov-paul added the MISC Any changes in any subproject that does not implement OTX API label Nov 24, 2021
@Ilya-Krylov Ilya-Krylov changed the title Distill-DSM Model [MISC] Distill-DSM Model Dec 16, 2021
@dkurt
Copy link

dkurt commented Jan 28, 2022

Hi, @Rakshith2597! Can you please check if I try to reproduce model conversion correctly?

net = U_Net(1, 2, conv_type='conv_2d', tsm=True, learn=True)
net.eval()
dummy_inp = torch.randn([1, 1, 128, 160, 160])
torch.onnx.export(net, dummy_inp, "model.onnx", opset_version=11)

with U_Net from https://github.com/Rakshith2597/training_extensions/blob/bmi7/misc/pytorch_toolkit/distilldsm/src/models/UNetDistillDSM.py.

I've found that there is a place with torch.split which returns zero dimension split:

shift_tensor, main_tensor = tensor.split([split_size*2, C - 2 * split_size], dim=1)

tensor.shape is [128, 32, 160, 160] but self.split_size is 16 so we have tensor.split([32, 0], dim=1) and main_tensor has zeros. Is that intentional?


If that's expected, please apply this patch to make model OpenVINO compatible:

@@ -107,7 +107,11 @@ class learnTSM(nn.Module):
         shape = T, C, H, W = tensor.shape
         split_size = self.split_size
 
-        shift_tensor, main_tensor = tensor.split([split_size*2, C - 2 * split_size], dim=1)
+        if split_size * 2 == tensor.shape[1]:
+            shift_tensor, main_tensor = tensor, None
+        else:
+            shift_tensor, main_tensor = tensor.split([split_size*2, C - 2 * split_size], dim=1)
+
         # pre_tensor, post_tensor = shift_tensor.split([split_size, split_size], dim=1)
         pre_tensor = shift_tensor
         post_tensor = shift_tensor
@@ -115,7 +119,8 @@ class learnTSM(nn.Module):
         main_conv_tensor = self.main_conv(shift_tensor).view(T//tsm_length, tsm_length, split_size, H, W)
         pre_tensor = self.pre_conv(pre_tensor).view(T//tsm_length, tsm_length, split_size//2, H, W)
         post_tensor = self.post_conv(post_tensor).view(T//tsm_length, tsm_length, split_size//2, H, W)
-        main_tensor = main_tensor.view(T//tsm_length, tsm_length, C - 2*split_size, H, W)
+        if main_tensor is not None:
+            main_tensor = main_tensor.view(T//tsm_length, tsm_length, C - 2*split_size, H, W)
 
         if self.version == 'zero':
             pre_tensor  = F.pad(pre_tensor,  (0, 0, 0, 0, 0, 0, 1, 0))[:,  :-1, ...]  # NOQA
@@ -126,7 +131,10 @@ class learnTSM(nn.Module):
             post_conv_tensor = torch.cat((post_conv_tensor[:,  1:  , ...],  # NOQA
                                      post_conv_tensor[:,   :1 , ...]), dim=1)  # NOQA
         # print(pre_tensor.shape, post_tensor.shape, main_conv_tensor.shape, main_tensor.shape, shape)
-        return torch.cat((pre_tensor, post_tensor, main_conv_tensor, main_tensor), dim=2).view(shape)
+        if main_tensor is not None:
+            return torch.cat((pre_tensor, post_tensor, main_conv_tensor, main_tensor), dim=2).view(shape)
+        else:
+            return torch.cat((pre_tensor, post_tensor, main_conv_tensor), dim=2).view(shape)

Tested accuracy (with OpenVINO 2021.4):

net = U_Net(1, 2, conv_type='conv_2d', tsm=True, learn=True)
net.eval()
dummy_inp = torch.randn([1, 1, 128, 160, 160])
torch.onnx.export(net, dummy_inp, "model.onnx", opset_version=11,
                  input_names=["input"], output_names=["output"])

inp = torch.randn([1, 1, 128, 160, 160])
ref = net(inp)

from openvino.inference_engine import IECore

ie = IECore()
net = ie.load_network("model.onnx", "CPU")
out = net.infer({"input": inp})["output"]
print(ref.shape)
print(out.shape)
print(np.max(np.abs(ref.detach().numpy() - out)))

max diff: 6.7055225e-08

@nervana-ff
Copy link

Can one of the admins verify this patch?

@ryanloney ryanloney assigned goodsong81 and unassigned morkovka1337 May 9, 2022
@ryanloney
Copy link

@goodsong81 can your team take a look at this?

@goodsong81
Copy link
Contributor

Please resolve the merge conflicts then mark this PR as 'ready for review'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DEPENDENCY Any changes in any dependencies (new dep or its version) should be produced via Change Request on PM MISC Any changes in any subproject that does not implement OTX API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants