-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: docs for retrieval #808
Conversation
Codecov Report
@@ Coverage Diff @@
## main #808 +/- ##
===========================================
- Coverage 83.90% 53.13% -30.77%
===========================================
Files 21 21
Lines 1466 1466
===========================================
- Hits 1230 779 -451
- Misses 236 687 +451
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
docs/user-guides/retrieval.md
Outdated
|
||
In order to implement retrieval, we add an [`AnnLite`](https://github.com/jina-ai/annlite) indexer executor after the encoder executor in CLIP-as-service. | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a table of contents would be nice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a table about what?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fcc7c7b
to
f2c1076
Compare
docs/user-guides/retriever.md
Outdated
# Search API | ||
|
||
|
||
## Basics of CLIP Search |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend remove this title
docs/user-guides/retriever.md
Outdated
|
||
|
||
|
||
## How to lower memory footprint? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## How to lower memory footprint? | |
## Lower the memory footprint |
docs/user-guides/retriever.md
Outdated
|
||
Sometimes the indexer will use a lot of memory because the HNSW indexer (which is used by `AnnLite`) is stored in memory. The efficient way to reduce memory footprint is dimension reduction. Retrieval in CLIP-as-service use [`Principal component analysis(PCA)`](https://en.wikipedia.org/wiki/Principal_component_analysis#:~:text=Principal%20component%20analysis%20(PCA)%20is,components%20and%20ignoring%20the%20rest.) to achieve this. | ||
|
||
### Whether PCA is needed in my case? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest to remove this title
docs/user-guides/retriever.md
Outdated
However, PCA will definitely lead to information losses since we remove some dimensions. And the more dimensions you remove, the more information losses will be. So the best practice will be estimate the memory usage first (if possible, see below) and choose the reasonable dimension after PCA. | ||
``` | ||
|
||
## How to deal with a very large dataset? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## How to deal with a very large dataset? | |
## Dealing with large dataset |
docs/user-guides/retriever.md
Outdated
``` | ||
|
||
|
||
## How to deploy it on the cloud? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## How to deploy it on the cloud? | |
## Deploy to JCloud |
docs/user-guides/retriever.md
Outdated
|
||
|
||
## How to deploy it on the cloud? | ||
Deployment can be easily achieved by using [`jcloud`](https://github.com/jina-ai/jcloud) or [`Amazon Kubernetes(EKS) Cluster`](https://aws.amazon.com/eks/). Taking `jcloud` as an example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use JCloud not `jcloud`
docs/user-guides/retriever.md
Outdated
|
||
Then you can perform exactly the same operations as we do on a single machine.(`/encode`, `/index` and `/search`) | ||
|
||
### Why different [polling strategies](https://docs.jina.ai/how-to/scale-out/?highlight=polling#different-polling-strategies) are needed for different endpoints? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Different polling strategies for different endpoints
I'm not sure. Should we put the search in Client doc since it is part of the Client? |
📝 Docs are deployed on https://ft-docs-retrieval--jina-docs.netlify.app 🎉 |
add documentation for clip-retrieval