-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add text to vision embedding #6282
Conversation
Signed-off-by: tangy5 <[email protected]>
for more information, see https://pre-commit.ci
Signed-off-by: tangy5 <[email protected]>
for more information, see https://pre-commit.ci
Signed-off-by: tangy5 <[email protected]>
Signed-off-by: tangy5 <[email protected]>
Signed-off-by: tangy5 <[email protected]>
Signed-off-by: tangy5 <[email protected]>
for more information, see https://pre-commit.ci
Hi @wyli , it would be great you can take a look at this PR when you get time. The PR is ready for review, but I struggled to get the flake8 check pass. It seems not this new script's format issue. Can you help to give a look? Thank you so much, let me know if there are any comments. |
/black |
Great, how can I create this "prototype"? A unit test workflow? Basically, the two PRs should be together, this PR's (#6283) network uses this PR's module. I plan to make these two classes reusable for most network backbones, e.g. unet, swinunetr, that if users want to use the "text embedding" for their network, it could be safely concatenated to the vision feature predicted by CNN/Transformers backbones. I feel a complete unit test or an integration test would be better to show the prototype here? Thank you. |
Signed-off-by: tangy5 <[email protected]>
Hi @wyli , thank you for the suggestion. For the prototype thing you mentioned, a complete workflow and application is here: https://github.com/ljwztc/CLIP-Driven-Universal-Model I'd like to make another pipeline for a partially supervised learning workflow to showcase how to use textembedding as a plug and play module later. I'm thinking this as two parts: 1: The text_embedding class that can load pre-trained or to add any text embedding to any network modules. These two modules are reusable designs. |
Signed-off-by: tangy5 <[email protected]>
for more information, see https://pre-commit.ci
Signed-off-by: tangy5 <[email protected]>
Signed-off-by: tangy5 <[email protected]>
Signed-off-by: tangy5 <[email protected]>
Signed-off-by: tangy5 <[email protected]>
@wyli , thank you so much for the suggestions, it's very helpful. Some changes according to your suggestions have been made:
could you have a look if the Thanks. |
Signed-off-by: tangy5 <[email protected]>
thanks @tangy5, looks good to me, there are issues with the CPU only tests, could you please revise? (@yiheng-wang-nv knows more if you need help) |
Signed-off-by: tangy5 <[email protected]>
for more information, see https://pre-commit.ci
Thanks. The test fails should because the loaded pre-trained weights are always mapped to GPU, modified and added map_location. |
Signed-off-by: monai-bot <[email protected]>
Signed-off-by: Wenqi Li <[email protected]>
Signed-off-by: Wenqi Li <[email protected]>
/build |
As part of the text to vision encoder for medical image analysis.
Support CLIP pre-trained embedding and random text embedding.
Linked to the issue: #6177