RADIOv2.5 as vision encoder #65

gheinrich · 2024-07-26T13:45:34Z

Hello,

Congratulations on a great project! I enjoyed reading your paper, where you clearly articulate the motivation behind each design choice. Your results are amazing!

Have you considered using RADIO as a vision encoder? We recently released version 2.5 of this vision foundation model, and our LLaVA 1.5 results look great, surpassing other vision encoders we've tried by a good margin. We believe that RADIO would be an excellent addition to your blend of vision encoders. RADIOv2.5-L is a ViT-L/16 and is very flexible, supporting input resolutions up to 2048x2048.

You can pull RADIO using either TorchHub or HuggingFace. We believe it's easy to integrate, but if you need any help, @mranzinger and I are here to assist!

Thanks in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RADIOv2.5 as vision encoder #65

RADIOv2.5 as vision encoder #65

gheinrich commented Jul 26, 2024 •

edited

Loading

RADIOv2.5 as vision encoder #65

RADIOv2.5 as vision encoder #65

Comments

gheinrich commented Jul 26, 2024 • edited Loading

gheinrich commented Jul 26, 2024 •

edited

Loading