-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: clip benchmark on zeroshot classification and retrieval tasks #832
Conversation
7e379ad
to
fc2e1c7
Compare
Codecov Report
@@ Coverage Diff @@
## main #832 +/- ##
==========================================
- Coverage 84.38% 81.58% -2.80%
==========================================
Files 21 21
Lines 1575 1575
==========================================
- Hits 1329 1285 -44
- Misses 246 290 +44
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
07c3ff2
to
5b2f782
Compare
| ViT-g-14::laion2b_s12b_b42k | 0.696 | **0.811** | **0.851** | 0.839 | **0.682** | 0.776 | 0.943 | **0.962** | **0.603** | 0.648 | 0.718 | 0.560 | 0.580 | **0.332** | 0.175 | 0.036 | 0.031 | 0.060 | 0.115 | 0.190 | 0.138 | | ||
+----------------------------------+------------+-----------+------------+-----------+-----------+------------+-----------+-----------+-----------+-----------+-----------+----------------+-------------+-------------+----------------+-------------------+----------------------+-------------------+---------------------+-----------+----------------+ | ||
|
||
From the table, we observe that the ViT models still outperform the RN models in most tasks, except for the Patch Camelyon dataset where ``RN50::openai`` has the best top-1 accuracy of 0.636, and the KITTI/distance dataset where ``RN50::yfcc15m`` has the best result of 0.336. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some cases where RN models beat ViT, such as Patch Camelyon
, KITTI/distance
. Is there any reason that they perform like this way?
From the table, we observe that the ViT models still outperform the RN models in most tasks, except for the Patch Camelyon dataset where ``RN50::openai`` has the best top-1 accuracy of 0.636, and the KITTI/distance dataset where ``RN50::yfcc15m`` has the best result of 0.336. | ||
Similar to retrieval results, the ``ViT-H-14::laion2b_s32b_b79k`` model and ``ViT-g-14::laion2b_s12b_b42k`` model still have the best or close to the best results on 12/21 zero-shot classification tasks. | ||
All models tend to perform well on ImageNetV2, VOC2007, VTAB natural and VTAB specialized (except for Retinopathy) datasets, whereas they perform poorly on VTAB structured datasets. | ||
We do not observe any significant difference between the ViT models of the same base model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For ViT models in retrieval tasks, the results of the same base model are better on those pre-trained with larger datasets (e.g., ViT-B-32::openai vs ViT-B-32::laion400m_e31 vs ViT-B-32::laion2b-s34b-b79k).
This is not the same in classification tasks
* chore: update benchmark intro * chore: minor revision * chore: minor revision * chore: minor revision * chore: minor revision * chore: minor revision * chore: minor revision
📝 Docs are deployed on https://ft-clip-benchmark--jina-docs.netlify.app 🎉 |
No description provided.