Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Warning raised when query to Persistent Client #1733

Closed
vkehfdl1 opened this issue Feb 18, 2024 · 7 comments
Closed

[Bug]: Warning raised when query to Persistent Client #1733

vkehfdl1 opened this issue Feb 18, 2024 · 7 comments
Assignees
Labels
bug Something isn't working

Comments

@vkehfdl1
Copy link

What happened?

I just use collection.query at my PersistentClient collection, and the a lot of logger warnings raised.
The ingested id all shows up as warnings.
It happens at chromadb/segment/impl/vector/local_persistent_hnsw.py

logger.warning(f"Add of existing embedding ID: {id}")

I can't figure out why all ids already existed access. And I don't know why it occur warning log, yet it is not add code.

It works perfectly, but logging is so verbose and confused that it try to add whole ids at collection all again. Even I didn't put any corpus to collection, just load and query it.

Versions

chroma-hnswlib==0.7.3
chromadb==0.4.22

It happens linux(ubuntu) and mac both. Python 3.10

Relevant log output

No response

@vkehfdl1 vkehfdl1 added the bug Something isn't working label Feb 18, 2024
@atroyn atroyn self-assigned this Feb 19, 2024
@atroyn
Copy link
Contributor

atroyn commented Feb 19, 2024

Hi @vkehfdl1 - could you provide more context? It seems like you are attempting to .add entries with the same id again, could you please share the code where you're ingesting data?

@vkehfdl1
Copy link
Author

My code looks like this.

def vectordb_pure(query: str, top_k: int, collection: chromadb.Collection,
                        embedding_model: BaseEmbedding):
    embedded_queries = list(map(embedding_model.get_query_embedding, queries))
    id_result = []
    for embedded_query in embedded_queries:
        result = collection.query(query_embeddings=embedded_query, n_results=top_k)
        id_result.extend(result[‘ids’])
    return id_result

def main():
    db = chromadb.PersistentClient(path=db_path)
    collection = db.get_collection(name=collection_name)
    embedding_model = OpenAIEmbedding() # LlamaIndex Embedding
    top_k = 5
    tasks = [vectordb_pure(input_queries, top_k, collection, embedding_model) for input_queries in queries]
    loop = asyncio.get_event_loop()
    results = loop.run_until_complete(process_batch(tasks, batch_size=batch))

If I execute this kind of code, it occurs a lot warning that I add a existing id, like I mentioned.
I didn’t even try to add id to ChromaDB, just try to query...

@atroyn
Copy link
Contributor

atroyn commented Feb 22, 2024

What does process_batch do? It looks like it might add embeddings - since you are using a persistent client, the collection will have been loaded when you do get_collection - is it possible this collection already contains records with ids you loaded before?

tazarov added a commit to amikos-tech/chroma-core that referenced this issue Feb 23, 2024
@vkehfdl1
Copy link
Author

vkehfdl1 commented Mar 2, 2024

Here is process_batch It's just run the given task in for loop. So, it did not add any embeddings.

def process_batch(tasks, batch_size: int = 64) -> List[Any]:
    results = []
    for i in range(0, len(tasks), batch_size):
        batch = tasks[i:i + batch_size]
        batch_results = await asyncio.gather(*batch)
        results.extend(batch_results)

    return results

Of course collection contain ids, but I don't add any embeddings at any code.
When I delete the warning line, it works fine. (Its feature is nothing wrong, just raise warning)

Maybe @tazarov fix this issue at #1763.
Hope to merge it quickly.
Thx:)

tazarov added a commit to amikos-tech/chroma-core that referenced this issue Mar 2, 2024
@joaomdmoura
Copy link

Same problem upon searching

@tazarov
Copy link
Contributor

tazarov commented Apr 24, 2024

@vkehfdl1, @joaomdmoura, we've found the root cause for this and working on a fix

@itaismith
Copy link
Contributor

#2062 implemented as part of #2512

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants