Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: support bm25 milvus function (#33)
This PR introduced some major refactors: - Introduce the abstract class `BaseMilvusBuiltInFunction`, which is a light wrapper of [Milvus Function](https://milvus.io/docs/manage-collections.md#Function). - Introduce `Bm25BuiltInFunction` extended from `BaseMilvusBuiltInFunction` , which includes the Milvus `FunctionType.BM25` settings and the configs of Milvus analyzer. We can use this `Bm25BuiltInFunction` to implement [Full text search](https://milvus.io/docs/full-text-search.md) in Milvus - In the future, Milvus will support more built-in Functions which support text-in(instead of vector-in) abilities, without transporting text to embedding on the user's end because it does this on the server's end automatically (here is a `FunctionType.TEXTEMBEDDING` [example](https://github.com/milvus-io/pymilvus/blob/master/examples/text_embedding.py)). So in the future we can implement more subclass from `BaseMilvusBuiltInFunction` to support the text-in functions in Milvus. - The how-to-use introduction is on the way, and there are some use case examples in the unittest `test_builtin_bm25_function()`. Simply speaking, we can pass in any customized Langchain embedding functions or milvus built-in functions to the Milvus class initialization function to build multi index fields in Milvus. Some use case examples will be like these: ```python from langchain_milvus import Milvus, BM25BuiltInFunction from langchain_openai import OpenAIEmbeddings embedding = OpenAIEmbeddings() vectorstore = Milvus.from_documents( documents=docs, embedding=embedding, builtin_function=BM25BuiltInFunction( output_field_names="sparse" ), #"dense" field is used for similarity search for OpenAI dense embedding, "sparse" field is used for BM25 full-text search vector_field=["dense", "sparse"], connection_args={ "uri": URI, }, drop_old=True, ) ``` or with multi embedding fields and bm25 function: ```python from langchain_voyageai import VoyageAIEmbeddings embedding = OpenAIEmbeddings() embedding2 = VoyageAIEmbeddings(model="voyage-3") vectorstore = Milvus.from_documents( documents=docs, embedding=[embedding, embedding2], builtin_function=BM25BuiltInFunction( input_field_names="text", output_field_names="sparse" ), text_field="text", vector_field=["dense", "dense2", "sparse"], connection_args={ "uri": URI, }, drop_old=True, ) ``` --------- Signed-off-by: ChengZi <[email protected]>
- Loading branch information