Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: clip benchmark on zeroshot classification and retrieval tasks #832

Merged
merged 34 commits into from
Oct 10, 2022

Conversation

ZiniuYu
Copy link
Member

@ZiniuYu ZiniuYu commented Sep 27, 2022

No description provided.

@ZiniuYu ZiniuYu marked this pull request as draft September 27, 2022 08:00
@ZiniuYu ZiniuYu changed the title docs: CLIP benchmark on zeroshot classification and retrieval tasks docs: clip benchmark on zeroshot classification and retrieval tasks Sep 27, 2022
@github-actions github-actions bot added size/m and removed size/s labels Sep 27, 2022
@codecov
Copy link

codecov bot commented Sep 27, 2022

Codecov Report

Merging #832 (9839451) into main (2ba8a4f) will decrease coverage by 2.79%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main     #832      +/-   ##
==========================================
- Coverage   84.38%   81.58%   -2.80%     
==========================================
  Files          21       21              
  Lines        1575     1575              
==========================================
- Hits         1329     1285      -44     
- Misses        246      290      +44     
Flag Coverage Δ
cas 81.58% <ø> (-2.80%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
server/clip_server/model/clip_onnx.py 72.72% <ø> (-5.46%) ⬇️
server/clip_server/model/pretrained_models.py 98.41% <ø> (ø)
server/clip_server/model/model.py 69.85% <0.00%> (-9.12%) ⬇️
server/clip_server/executors/clip_onnx.py 81.94% <0.00%> (-2.78%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@github-actions github-actions bot added size/l and removed size/m labels Sep 30, 2022
@github-actions github-actions bot added size/m and removed size/l labels Sep 30, 2022
@ZiniuYu ZiniuYu marked this pull request as ready for review October 9, 2022 06:50
@ZiniuYu ZiniuYu requested a review from a team October 9, 2022 06:55
docs/user-guides/benchmark.rst Outdated Show resolved Hide resolved
| ViT-g-14::laion2b_s12b_b42k | 0.696 | **0.811** | **0.851** | 0.839 | **0.682** | 0.776 | 0.943 | **0.962** | **0.603** | 0.648 | 0.718 | 0.560 | 0.580 | **0.332** | 0.175 | 0.036 | 0.031 | 0.060 | 0.115 | 0.190 | 0.138 |
+----------------------------------+------------+-----------+------------+-----------+-----------+------------+-----------+-----------+-----------+-----------+-----------+----------------+-------------+-------------+----------------+-------------------+----------------------+-------------------+---------------------+-----------+----------------+

From the table, we observe that the ViT models still outperform the RN models in most tasks, except for the Patch Camelyon dataset where ``RN50::openai`` has the best top-1 accuracy of 0.636, and the KITTI/distance dataset where ``RN50::yfcc15m`` has the best result of 0.336.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some cases where RN models beat ViT, such as Patch Camelyon, KITTI/distance. Is there any reason that they perform like this way?

From the table, we observe that the ViT models still outperform the RN models in most tasks, except for the Patch Camelyon dataset where ``RN50::openai`` has the best top-1 accuracy of 0.636, and the KITTI/distance dataset where ``RN50::yfcc15m`` has the best result of 0.336.
Similar to retrieval results, the ``ViT-H-14::laion2b_s32b_b79k`` model and ``ViT-g-14::laion2b_s12b_b42k`` model still have the best or close to the best results on 12/21 zero-shot classification tasks.
All models tend to perform well on ImageNetV2, VOC2007, VTAB natural and VTAB specialized (except for Retinopathy) datasets, whereas they perform poorly on VTAB structured datasets.
We do not observe any significant difference between the ViT models of the same base model.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this mean?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For ViT models in retrieval tasks, the results of the same base model are better on those pre-trained with larger datasets (e.g., ViT-B-32::openai vs ViT-B-32::laion400m_e31 vs ViT-B-32::laion2b-s34b-b79k).
This is not the same in classification tasks

ZiniuYu and others added 3 commits October 9, 2022 22:57
* chore: update benchmark intro

* chore: minor revision

* chore: minor revision

* chore: minor revision

* chore: minor revision

* chore: minor revision

* chore: minor revision
@github-actions
Copy link

📝 Docs are deployed on https://ft-clip-benchmark--jina-docs.netlify.app 🎉

@numb3r3 numb3r3 merged commit 7ee58c8 into main Oct 10, 2022
@numb3r3 numb3r3 deleted the clip-benchmark branch October 10, 2022 06:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants