feat: support bm25 milvus function #33

zc277584121 · 2025-01-03T09:40:59Z

This PR introduced some major refactors:

Introduce the abstract class BaseMilvusBuiltInFunction, which is a light wrapper of Milvus Function.
Introduce Bm25BuiltInFunction extended from BaseMilvusBuiltInFunction , which includes the Milvus FunctionType.BM25 settings and the configs of Milvus analyzer. We can use this Bm25BuiltInFunction to implement Full text search in Milvus
In the future, Milvus will support more built-in Functions which support text-in(instead of vector-in) abilities, without transporting text to embedding on the user's end because it does this on the server's end automatically (here is a FunctionType.TEXTEMBEDDING example). So in the future we can implement more subclass from BaseMilvusBuiltInFunction to support the text-in functions in Milvus.
The how-to-use introduction is on the way, and there are some use case examples in the unittest test_builtin_bm25_function(). Simply speaking, we can pass in any customized Langchain embedding functions or milvus built-in functions to the Milvus class initialization function to build multi index fields in Milvus.
Some use case examples will be like these:

from langchain_milvus import Milvus, BM25BuiltInFunction
from langchain_openai import OpenAIEmbeddings

embedding = OpenAIEmbeddings()

vectorstore = Milvus.from_documents(
    documents=docs,
    embedding=embedding,
    builtin_function=BM25BuiltInFunction(
        output_field_names="sparse"
    ),
    #"dense" field is used for similarity search for OpenAI dense embedding, "sparse" field is used for BM25 full-text search
    vector_field=["dense", "sparse"],
    connection_args={
        "uri": URI,
    },
    drop_old=True,
)

or with multi embedding fields and bm25 function:

from langchain_voyageai import VoyageAIEmbeddings

embedding = OpenAIEmbeddings()
embedding2 = VoyageAIEmbeddings(model="voyage-3")

vectorstore = Milvus.from_documents(
    documents=docs,
    embedding=[embedding, embedding2],
    builtin_function=BM25BuiltInFunction(
        input_field_names="text",
        output_field_names="sparse"
    ),
    text_field="text",
    vector_field=["dense", "dense2", "sparse"],
    connection_args={
        "uri": URI,
    },
    drop_old=True,
)

Signed-off-by: ChengZi <[email protected]>

ohadeytan · 2025-01-06T10:56:34Z

@zc277584121 running the test_builtin_bm25_function I get:

    def check_status(status: Status):
        if status.code != 0 or status.error_code != 0:
>           raise MilvusException(status.code, status.reason, status.error_code)
E           pymilvus.exceptions.MilvusException: <MilvusException: (code=65535, message=invalid index type: AUTOINDEX, local mode only support SPARSE_INVERTED_INDEX SPARSE_WAND: )>

Is this expected or something is wrong with my settings?

zc277584121 · 2025-01-07T02:25:19Z

@ohadeytan The full text search feature is so far not supported in Milvus-Lite. It says local mode only support SPARSE_INVERTED_INDEX SPARSE_WAND, because full text search uses BM25 index type. I think to run this unittest successfully, we can only use Milvus Docker Standalone service currently. Thanks for you feedback, i think i have to left some notice in the unittest and futher documents.

Signed-off-by: ChengZi <[email protected]>

zc277584121 · 2025-01-10T10:10:41Z

here is document, which is waiting final reviewing, https://github.com/zc277584121/bootcamp/blob/langchain_doc/bootcamp/tutorials/integration/langchain/full_text_search_with_langchain.ipynb
I merge this PR, and i think the new package version will be released in next week
FYI, @ohadeytan

janaki-sasidhar · 2025-01-13T07:27:42Z

What if I want to use seperate search prompt for keyword and semantic, this hybird retriever wrapper isnt flexiblefor that i think

zc277584121 · 2025-01-14T02:20:04Z

@janaki-sasidhar Do you mean this kind of case:

search_prompt1->[vector of prompt1]-> semantic search-> result docs from 1,
search_prompt2-> keyword search -> result docs from 2,
[result docs from1 + result docs from2] -> rerank -> final result docs

If so, can you explain what is its scenario, any infomation will be appreciate

feat: support bm25 milvus function

8845a95

Signed-off-by: ChengZi <[email protected]>

zc277584121 mentioned this pull request Jan 3, 2025

[Feature Request] Support for Full Text Search #30

Closed

zc277584121 added 3 commits January 8, 2025 19:01

fix ci

2b421ba

Signed-off-by: ChengZi <[email protected]>

refine name of built-in function attribute

15d4340

Signed-off-by: ChengZi <[email protected]>

final refinement for BM25 builtin function

cf0f476

Signed-off-by: ChengZi <[email protected]>

zc277584121 requested a review from efriis January 9, 2025 09:36

zc277584121 merged commit 1c13e43 into langchain-ai:main Jan 10, 2025
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support bm25 milvus function #33

feat: support bm25 milvus function #33

zc277584121 commented Jan 3, 2025 •

edited

Loading

ohadeytan commented Jan 6, 2025

zc277584121 commented Jan 7, 2025

zc277584121 commented Jan 10, 2025

janaki-sasidhar commented Jan 13, 2025

zc277584121 commented Jan 14, 2025

feat: support bm25 milvus function #33

feat: support bm25 milvus function #33

Conversation

zc277584121 commented Jan 3, 2025 • edited Loading

ohadeytan commented Jan 6, 2025

zc277584121 commented Jan 7, 2025

zc277584121 commented Jan 10, 2025

janaki-sasidhar commented Jan 13, 2025

zc277584121 commented Jan 14, 2025

zc277584121 commented Jan 3, 2025 •

edited

Loading