Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RADIOv2.5 as vision encoder #65

Open
gheinrich opened this issue Jul 26, 2024 · 0 comments
Open

RADIOv2.5 as vision encoder #65

gheinrich opened this issue Jul 26, 2024 · 0 comments

Comments

@gheinrich
Copy link

gheinrich commented Jul 26, 2024

Hello,

Congratulations on a great project! I enjoyed reading your paper, where you clearly articulate the motivation behind each design choice. Your results are amazing!

Have you considered using RADIO as a vision encoder? We recently released version 2.5 of this vision foundation model, and our LLaVA 1.5 results look great, surpassing other vision encoders we've tried by a good margin. We believe that RADIO would be an excellent addition to your blend of vision encoders. RADIOv2.5-L is a ViT-L/16 and is very flexible, supporting input resolutions up to 2048x2048.

You can pull RADIO using either TorchHub or HuggingFace. We believe it's easy to integrate, but if you need any help, @mranzinger and I are here to assist!

Thanks in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant