Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Cohere Embedding #3190

Open
hananshandler opened this issue Nov 25, 2024 · 2 comments
Open

[Bug]: Cohere Embedding #3190

hananshandler opened this issue Nov 25, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@hananshandler
Copy link

What happened?

I'm attempting to use Cohere's embeddings in chromadb but getting the attached error. Here is my code:

from chromadb.utils import embedding_functions
cohere_ef = embedding_functions.CohereEmbeddingFunction(api_key=cohere_api_key)
cohere_ef(['test'])

Any suggestions? Thanks!

Versions

Chroma v0.5.20, Python v3.10.14

Relevant log output

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[9], line 3
      1 from chromadb.utils import embedding_functions
      2 cohere_ef = embedding_functions.CohereEmbeddingFunction(api_key=cohere_api_key)
----> 3 cohere_ef(['test'])

File ~/git/infolinc-chunking-evaluation/.venv/lib/python3.10/site-packages/chromadb/api/types.py:462, in EmbeddingFunction.__init_subclass__.<locals>.__call__(self, input)
    460 result = call(self, input)
    461 assert result is not None
--> 462 return validate_embeddings(cast(Embeddings, normalize_embeddings(result)))

File ~/git/infolinc-chunking-evaluation/.venv/lib/python3.10/site-packages/chromadb/api/types.py:82, in normalize_embeddings(target)
     79     if target.ndim == 2:
     80         return list(target)
---> 82 raise ValueError(
     83     f"Expected embeddings to be a list of floats or ints, a list of lists, a numpy array, or a list of numpy arrays, got {target}"
     84 )

ValueError: Expected embeddings to be a list of floats or ints, a list of lists, a numpy array, or a list of numpy arrays, got [('response_type', 'embeddings_floats'), ('id', '1db95cbd-05b4-4081-a75d-d552124477e9'), ('embeddings', [[2.3085938, 0.04901123, -0.16
@hananshandler hananshandler added the bug Something isn't working label Nov 25, 2024
@hananshandler
Copy link
Author

hananshandler commented Nov 25, 2024

Just solved this: you must add a [2][1] at the end of the cohere embedding call to retrieve only the embeddings, which is what chroma expects.

cohere_embedding_function.py in chromadb/utils/embedding_functions should be changed to the following:

import logging

from chromadb.api.types import Documents, EmbeddingFunction, Embeddings

logger = logging.getLogger(__name__)

class CohereEmbeddingFunction(EmbeddingFunction[Documents]):
    def __init__(self, api_key: str, model_name: str = "large"):
        try:
            import cohere
        except ImportError:
            raise ValueError(
                "The cohere python package is not installed. Please install it with `pip install cohere`"
            )

        self._client = cohere.Client(api_key)
        self._model_name = model_name

    def __call__(self, input: Documents) -> Embeddings:
        # Call Cohere Embedding API for each document.
        return [
            embeddings
            for embeddings in self._client.embed(
                texts=input, model=self._model_name, input_type="search_document"
            )
        ][2][1]

@tazarov
Copy link
Contributor

tazarov commented Nov 26, 2024

hey @hananshandler, thanks for reporting this. A while ago Cohere bumped their client to v5, we have a relatively long standing PR #2262 (plus a few others following it) that attempt to fix the problem with using the new client.

Let me see if we can push these PRs in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants