You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
y_pred = [
[
{"role": "user", "content": "What is the value of 2 + 2?"},
{"role": "assistant", "content": "2 + 2 = 4",
"context": {
"citations": [{'id': 'math_document1.md',
'content': 'Information about additions: '
'1 + 2 = 3, 2 + 2 = 4'}]
}
}
],
[
{"role": "user", "content": "Where is Taj Mahal located?"},
{"role": "assistant", "content": "Taj Mahal is located in Agra, India",
"context": {
"citations": [{'id': 'taj_mahal_document1.md',
'content': 'Taj Mahal is located in Agra, India '
'and is one of the seven wonders of the world.'}]
}
}
]
]
Operating System
MacOS
Version Information
(modeleval) (base) MengMacBook-M3MaxPro:model_evaluation wanmeng$ pip install azureml-metrics[generative-ai]
Requirement already satisfied: azureml-metrics[generative-ai] in /Users/wanmeng/miniconda3/envs/modeleval/lib/python3.11/site-packages (0.0.57)
Steps to reproduce
OPENAI_API_BASE="https://gpt4o-eliz-westus3.openai.azure.com/"
OPENAI_API_TYPE="azure"
OPENAI_API_KEY="
deployment_id="gpt-4o"
gpt4o model
from azureml.metrics import compute_metrics, constants
from pprint import pprint
import os
y_test = [["4", "2 + 2 = 4"], ["Agra", "Agra, India"]]
y_pred = [
[
{"role": "user", "content": "What is the value of 2 + 2?"},
{"role": "assistant", "content": "2 + 2 = 4",
"context": {
"citations": [{'id': 'math_document1.md',
'content': 'Information about additions: '
'1 + 2 = 3, 2 + 2 = 4'}]
}
}
],
[
{"role": "user", "content": "Where is Taj Mahal located?"},
{"role": "assistant", "content": "Taj Mahal is located in Agra, India",
"context": {
"citations": [{'id': 'taj_mahal_document1.md',
'content': 'Taj Mahal is located in Agra, India '
'and is one of the seven wonders of the world.'}]
}
}
]
]
openai_params = {
"api_version": OPENAI_API_VERSION,
"api_base": OPENAI_API_BASE,
"api_type": OPENAI_API_TYPE,
"api_key" : OPENAI_API_KEY,
"deployment_id": deployment_id
}
metrics_config = {
"openai_params": openai_params,
"score_version": "v1",
"use_chat_completion_api": True,
# To compute RAG based metrics
"metrics": ["gpt_relevance", "gpt_groundedness", "gpt_retrieval_score"]
}
The above metrics can even be computed by setting the task_type to RAG_EVALUATION
result = compute_metrics(task_type=constants.Tasks.CHAT_COMPLETION,
y_test=y_test,
y_pred=y_pred,
**metrics_config)
pprint(result)
Expected behavior
'metrics': {'mean_gpt_groundedness': 5.0,
'mean_gpt_relevance': 5.0,
'mean_gpt_retrieval_score': 5.0}}
Actual behavior
'metrics': {'mean_gpt_groundedness': 5.0,
'mean_gpt_relevance': 5.0,
'mean_gpt_retrieval_score': nan}}
Addition information
A customer is using this metrics next week, pls help fix the problem, thanks a lot!
The text was updated successfully, but these errors were encountered: