Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add Azure embedders support #6676

Merged
merged 13 commits into from
Jan 5, 2024
Merged

feat: Add Azure embedders support #6676

merged 13 commits into from
Jan 5, 2024

Conversation

vblagoje
Copy link
Member

@vblagoje vblagoje commented Jan 2, 2024

Why:

Add new embedding components that leverage Azure's cognitive services for text and document embedding. These new components are intended to provide users with more embedding options and utilize Azure's models.

What:

The main changes introduced in the azure_embedders branch include:

  1. Addition of Azure Embedders: The AzureOpenAITextEmbedder and AzureOpenAIDocumentEmbedder classes have been added. These are similar to their OpenAI counterparts but use Azure API for embedding texts and documents.
  2. Modifications in __init__.py: The __init__.py file within the embedders directory has been modified to include the newly added Azure embedder classes.
  3. Unit Tests: New unit tests have been added for both the AzureOpenAITextEmbedder and AzureOpenAIDocumentEmbedder to ensure they work as expected and handle various inputs correctly.
  4. Release Notes: A release note has been added to document the inclusion of new Azure embedders.

How can it be used:

The new Azure embedders can be used in the Haystack framework for tasks that involve text and document embedding. Users can utilize these embedders to convert texts or documents into vectors using Azure's cognitive services. This can be particularly useful for tasks involving semantic search, text clustering, or any application that requires understanding the semantic content of texts.

How did you test it:

  1. Unit Tests: New tests were added to verify the functionality of the Azure embedders. These tests check if the embedders correctly initialize with default and custom parameters and if they return expected embeddings for given texts and documents.
  2. Integration Testing: The embedders were tested in an integrated environment to ensure they interact correctly with the Azure API and the rest of the haystack components. This includes testing with actual text inputs and verifying the embeddings' quality and format.

Notes for the reviewer:

  • Review the implementation of the Azure embedders to ensure they align with the project's standards and efficiently utilize Azure's cognitive services.
  • Consider the unit tests' coverage and whether any additional scenarios need to be tested.
  • As the embedders rely on external services (Azure), consider the latency, cost, and availability implications when using these components in production environments.

@vblagoje vblagoje requested review from a team as code owners January 2, 2024 13:34
@vblagoje vblagoje requested review from dfokina and anakin87 and removed request for a team January 2, 2024 13:34
@github-actions github-actions bot added topic:tests 2.x Related to Haystack v2.0 type:documentation Improvements on the docs labels Jan 2, 2024
@vblagoje
Copy link
Member Author

vblagoje commented Jan 2, 2024

Reviewers @anakin87 @dfokina - still waiting for our DevOps for the actual Azure model deployments. Will take another day or two. Please don't start the review process yet.

@vblagoje
Copy link
Member Author

vblagoje commented Jan 4, 2024

@anakin87 thanks to @steppi91 , we have tested the live azure embedding models in the last c744be1 commit. This should be ready to go.

@vblagoje
Copy link
Member Author

vblagoje commented Jan 4, 2024

All yours for tomorrow @anakin87 🚀

Copy link
Member

@anakin87 anakin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good PR!
I found some little opportunities for improvement.

Copy link
Member

@anakin87 anakin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make the tests pass with the suggested changes,
then this PR is good to go!!!

haystack/components/embedders/azure_document_embedder.py Outdated Show resolved Hide resolved
haystack/components/embedders/azure_text_embedder.py Outdated Show resolved Hide resolved
@vblagoje vblagoje merged commit 552f0e3 into main Jan 5, 2024
21 checks passed
@vblagoje vblagoje deleted the azure_embedders branch January 5, 2024 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.x Related to Haystack v2.0 topic:tests type:documentation Improvements on the docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants