docs: docs for retrieval #808

jemmyshin · 2022-08-23T07:09:34Z

add documentation for clip-retrieval

codecov · 2022-08-23T07:14:43Z

Codecov Report

Merging #808 (324cd29) into main (47144c2) will decrease coverage by 30.76%.
The diff coverage is n/a.

@@             Coverage Diff             @@
##             main     #808       +/-   ##
===========================================
- Coverage   83.90%   53.13%   -30.77%     
===========================================
  Files          21       21               
  Lines        1466     1466               
===========================================
- Hits         1230      779      -451     
- Misses        236      687      +451

Flag	Coverage Δ
cas	`53.13% <ø> (-30.77%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
server/clip_server/executors/clip_tensorrt.py	`0.00% <0.00%> (-92.73%)`	⬇️
server/clip_server/model/clip_trt.py	`0.00% <0.00%> (-85.72%)`	⬇️
server/clip_server/model/mclip_model.py	`0.00% <0.00%> (-84.22%)`	⬇️
server/clip_server/model/trt_utils.py	`0.00% <0.00%> (-83.52%)`	⬇️
client/clip_client/client.py	`44.79% <0.00%> (-42.54%)`	⬇️
server/clip_server/executors/helper.py	`64.70% <0.00%> (-32.36%)`	⬇️
server/clip_server/model/model.py	`53.86% <0.00%> (-25.06%)`	⬇️
server/clip_server/model/clip.py	`68.75% <0.00%> (-18.75%)`	⬇️
server/clip_server/model/pretrained_models.py	`84.12% <0.00%> (-14.29%)`	⬇️
server/clip_server/model/clip_model.py	`75.00% <0.00%> (-12.50%)`	⬇️
... and 5 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

docs/user-guides/retrieval.md

fogx · 2022-08-24T07:29:43Z

docs/user-guides/retrieval.md

+
+In order to implement retrieval, we add an [`AnnLite`](https://github.com/jina-ai/annlite) indexer executor after the encoder executor in CLIP-as-service.
+
+


a table of contents would be nice

a table about what?

something like this :)

I think we already have one here on the right side of the page:

docs/index.md

docs/user-guides/retrieval.md

docs/user-guides/retriever.md

ZiniuYu · 2022-09-09T08:17:51Z

docs/user-guides/retriever.md

+# Search API
+
+
+## Basics of CLIP Search


I'd recommend remove this title

ZiniuYu · 2022-09-09T08:18:37Z

docs/user-guides/retriever.md

+
+
+
+## How to lower memory footprint?


Suggested change

## How to lower memory footprint?

## Lower the memory footprint

ZiniuYu · 2022-09-09T08:19:21Z

docs/user-guides/retriever.md

+
+Sometimes the indexer will use a lot of memory because the HNSW indexer (which is used by `AnnLite`) is stored in memory. The efficient way to reduce memory footprint is dimension reduction. Retrieval in CLIP-as-service use [`Principal component analysis(PCA)`](https://en.wikipedia.org/wiki/Principal_component_analysis#:~:text=Principal%20component%20analysis%20(PCA)%20is,components%20and%20ignoring%20the%20rest.) to achieve this.
+
+### Whether PCA is needed in my case?


I'd suggest to remove this title

ZiniuYu · 2022-09-09T08:19:46Z

docs/user-guides/retriever.md

+However, PCA will definitely lead to information losses since we remove some dimensions. And the more dimensions you remove, the more information losses will be. So the best practice will be estimate the memory usage first (if possible, see below) and choose the reasonable dimension after PCA.
+```
+
+## How to deal with a very large dataset?


Suggested change

## How to deal with a very large dataset?

## Dealing with large dataset

ZiniuYu · 2022-09-09T08:20:14Z

docs/user-guides/retriever.md

+```
+
+
+## How to deploy it on the cloud?


Suggested change

## How to deploy it on the cloud?

## Deploy to JCloud

ZiniuYu · 2022-09-09T08:20:53Z

docs/user-guides/retriever.md

+
+
+## How to deploy it on the cloud?
+Deployment can be easily achieved by using [`jcloud`](https://github.com/jina-ai/jcloud) or [`Amazon Kubernetes(EKS) Cluster`](https://aws.amazon.com/eks/). Taking `jcloud` as an example:


Use JCloud not `jcloud`

ZiniuYu · 2022-09-09T08:21:47Z

docs/user-guides/retriever.md

+
+Then you can perform exactly the same operations as we do on a single machine.(`/encode`, `/index` and `/search`)
+
+### Why different [polling strategies](https://docs.jina.ai/how-to/scale-out/?highlight=polling#different-polling-strategies) are needed for different endpoints?


Different polling strategies for different endpoints

ZiniuYu · 2022-09-09T08:26:21Z

I'm not sure. Should we put the search in Client doc since it is part of the Client?
We should at least mention the search function in Client intros and link to the search docs

github-actions · 2022-09-09T09:50:39Z

📝 Docs are deployed on https://ft-docs-retrieval--jina-docs.netlify.app 🎉

github-actions bot added size/m area/docs labels Aug 23, 2022

ZiniuYu reviewed Aug 23, 2022

View reviewed changes