-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Warning raised when query to Persistent Client #1733
Comments
Hi @vkehfdl1 - could you provide more context? It seems like you are attempting to |
My code looks like this. def vectordb_pure(query: str, top_k: int, collection: chromadb.Collection,
embedding_model: BaseEmbedding):
embedded_queries = list(map(embedding_model.get_query_embedding, queries))
id_result = []
for embedded_query in embedded_queries:
result = collection.query(query_embeddings=embedded_query, n_results=top_k)
id_result.extend(result[‘ids’])
return id_result
def main():
db = chromadb.PersistentClient(path=db_path)
collection = db.get_collection(name=collection_name)
embedding_model = OpenAIEmbedding() # LlamaIndex Embedding
top_k = 5
tasks = [vectordb_pure(input_queries, top_k, collection, embedding_model) for input_queries in queries]
loop = asyncio.get_event_loop()
results = loop.run_until_complete(process_batch(tasks, batch_size=batch)) If I execute this kind of code, it occurs a lot warning that I add a existing id, like I mentioned. |
What does |
Here is def process_batch(tasks, batch_size: int = 64) -> List[Any]:
results = []
for i in range(0, len(tasks), batch_size):
batch = tasks[i:i + batch_size]
batch_results = await asyncio.gather(*batch)
results.extend(batch_results)
return results Of course collection contain Maybe @tazarov fix this issue at #1763. |
Same problem upon searching |
@vkehfdl1, @joaomdmoura, we've found the root cause for this and working on a fix |
What happened?
I just use
collection.query
at myPersistentClient
collection, and the a lot of logger warnings raised.The ingested id all shows up as warnings.
It happens at chromadb/segment/impl/vector/local_persistent_hnsw.py
I can't figure out why all ids already existed access. And I don't know why it occur warning log, yet it is not
add
code.It works perfectly, but logging is so verbose and confused that it try to add whole ids at collection all again. Even I didn't put any corpus to collection, just load and query it.
Versions
chroma-hnswlib==0.7.3
chromadb==0.4.22
It happens linux(ubuntu) and mac both. Python 3.10
Relevant log output
No response
The text was updated successfully, but these errors were encountered: