Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add text to vision embedding #6282

Merged
merged 28 commits into from
Apr 13, 2023
Merged

Add text to vision embedding #6282

merged 28 commits into from
Apr 13, 2023

Conversation

tangy5
Copy link
Contributor

@tangy5 tangy5 commented Apr 4, 2023

As part of the text to vision encoder for medical image analysis.

Support CLIP pre-trained embedding and random text embedding.

Linked to the issue: #6177

@tangy5 tangy5 self-assigned this Apr 4, 2023
tangy5 added 2 commits April 4, 2023 21:58
Signed-off-by: tangy5 <[email protected]>
Signed-off-by: tangy5 <[email protected]>
Signed-off-by: tangy5 <[email protected]>
Signed-off-by: tangy5 <[email protected]>
@tangy5 tangy5 marked this pull request as ready for review April 5, 2023 05:32
@tangy5
Copy link
Contributor Author

tangy5 commented Apr 5, 2023

Hi @wyli , it would be great you can take a look at this PR when you get time. The PR is ready for review, but I struggled to get the flake8 check pass. It seems not this new script's format issue. Can you help to give a look?

Thank you so much, let me know if there are any comments.

@tangy5 tangy5 requested a review from wyli April 5, 2023 06:22
@wyli
Copy link
Contributor

wyli commented Apr 5, 2023

/black
thanks @tangy5, would be great to first have a prototype of https://github.com/Project-MONAI/MONAI/milestone/98 with all the basic functionality (it's fine to not have detailed docstrings/unit tests), then we identify possible reusable modules and create smaller PRs. e.g it's interesting to see the module's API is def forward(self) which doesn't take any input additional input, with a prototype it'll be easier to see how this could be used as part of a larger network.

@tangy5
Copy link
Contributor Author

tangy5 commented Apr 5, 2023

thanks @tangy5, would be great to first have a prototype of https://github.com/Project-MONAI/MONAI/milestone/98 with all the basic functionality (it's fine to not have detailed docstrings/unit tests), then we identify possible reusable modules and create smaller PRs. e.g it's interesting to see the module's API is def forward(self) which doesn't take any input additional input, with a prototype it'll be easier to see how this could be used as part of a larger network.

Great, how can I create this "prototype"? A unit test workflow? Basically, the two PRs should be together, this PR's (#6283) network uses this PR's module.

I plan to make these two classes reusable for most network backbones, e.g. unet, swinunetr, that if users want to use the "text embedding" for their network, it could be safely concatenated to the vision feature predicted by CNN/Transformers backbones.

I feel a complete unit test or an integration test would be better to show the prototype here?

Thank you.

@tangy5
Copy link
Contributor Author

tangy5 commented Apr 6, 2023

Hi @wyli , thank you for the suggestion. For the prototype thing you mentioned, a complete workflow and application is here: https://github.com/ljwztc/CLIP-Driven-Universal-Model

I'd like to make another pipeline for a partially supervised learning workflow to showcase how to use textembedding as a plug and play module later.

I'm thinking this as two parts:

1: The text_embedding class that can load pre-trained or to add any text embedding to any network modules.
2. A dynamic segmentor class that can be used (reusable) as a separate module attached to segmentation network backbones (e.g., SwinUNETR, UNET, segResNet, etc)

These two modules are reusable designs.
Let me know if we need a discussion. Thank you.

tests/test_text_encoding.py Outdated Show resolved Hide resolved
Signed-off-by: tangy5 <[email protected]>
tangy5 added 2 commits April 12, 2023 12:23
Signed-off-by: tangy5 <[email protected]>
@tangy5
Copy link
Contributor Author

tangy5 commented Apr 12, 2023

@wyli , thank you so much for the suggestions, it's very helpful.

Some changes according to your suggestions have been made:

  1. add pretrained option
  2. add url_map for released text embedding
  3. add 2D embedding support
  4. add test cases 3D, 2D.

could you have a look if the skip_if_downloading_fails is used the correct way..

Thanks.

@wyli
Copy link
Contributor

wyli commented Apr 13, 2023

thanks @tangy5, looks good to me, there are issues with the CPU only tests, could you please revise? (@yiheng-wang-nv knows more if you need help)

@tangy5
Copy link
Contributor Author

tangy5 commented Apr 13, 2023

thanks @tangy5, looks good to me, there are issues with the CPU only tests, could you please revise? (@yiheng-wang-nv knows more if you need help)

Thanks. The test fails should because the loaded pre-trained weights are always mapped to GPU, modified and added map_location.
Other changes are modified according to your comments, thanks.
I set the pretrained to True by default and to load the release clip embedding. In the future, if there are more released weights, users can load it, and modify text_dim accordingly.

Signed-off-by: monai-bot <[email protected]>
Signed-off-by: Wenqi Li <[email protected]>
Signed-off-by: Wenqi Li <[email protected]>
@wyli
Copy link
Contributor

wyli commented Apr 13, 2023

/build

@wyli wyli enabled auto-merge (squash) April 13, 2023 19:20
@wyli wyli merged commit 57c618c into Project-MONAI:dev Apr 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants