-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Complex tool use broken on gemini-2.0-flash-exp (with repo template) google.genai #54
Comments
I decided to go back to basics and do a super simple "canonical" example of GoogleSearch just using the built-in functionality - it doesn't work at all. from googleapiclient.discovery import build
import ssl
import subprocess
from google import genai
from google.genai.types import (
GenerateContentConfig,
Tool,
Part,
FunctionCallingConfig,
Content,
GoogleSearch,
DynamicRetrievalConfig,
GoogleSearchRetrieval
)
import os
import asyncio
import traceback
# Replace with your API key
GEMINI_API_KEY = os.getenv("GOOGLE_GENAI_API_KEY")
GEMINI_MODEL = "gemini-2.0-flash-exp"
async def test_gemini_with_search():
try:
# Initialize the client
client = genai.Client(
vertexai=False,
api_key=GEMINI_API_KEY
)
# Create the tool configuration
config = GenerateContentConfig(
temperature=0.0,
tools=[Tool(google_search=GoogleSearch())],
tool_config=FunctionCallingConfig(mode="ANY")
)
query = Part.from_text(
"""Find me the URLs for the most recent research papers about the lottery ticket hypothesis in neural networks. Search for: 'lottery ticket hypothesis' neural networks 2023..2024 filetype:pdf site:arxiv.org also do a separate query for the latest news in the UK and report on that"""
)
# Make the API call with a prompt that requires search
response = await client.aio.models.generate_content(
model=GEMINI_MODEL,
contents=[Content(parts=[
query
])],
config=config,
)
print("\nResponse received. Processing parts...")
# Process the response
if response.candidates:
candidate = response.candidates[0]
# Check grounding metadata for search results
if hasattr(candidate, 'grounding_metadata') and candidate.grounding_metadata:
print("\nGrounding Metadata (Search Results):")
print(candidate.grounding_metadata)
# Print the actual response text
if candidate.content and candidate.content.parts:
for part in candidate.content.parts:
if part.text:
print("\nText Response:")
print(part.text)
else:
print("\nNo candidates in response")
print(f"Full response: {response}")
except Exception as e:
print(f"Error during test: {str(e)}")
traceback.print_exc()
raise
if __name__ == "__main__":
asyncio.run(test_gemini_with_search()) We can see the search took place from the grounding metadata, but the model chose to ignore it and hallucinate a bunch of URLs. I tried to use the DynamicRetrieval feature (no documentation) config = GenerateContentConfig(
temperature=0.0,
tools=[Tool(
google_search=GoogleSearch(),
retrieval=DynamicRetrievalConfig(
min_score=0.8, # High confidence threshold
max_results=5, # Number of search results to use
include_citations=True # Include source citations
)
)],
tool_config=FunctionCallingConfig(mode="ANY") It looks like it's just not wired up in the API or broken somehow I have to conclude that gemini-2.0-flash-exp is not yet a strong enough model for doing tool use, real shame. Please point out if I have made a stupid mistake somewhere. Claude is amazing at all of this stuff. What makes it more of a shame is that the model is really good in other ways, I have been testing it my pipeline and it's amazing for doing many other things, I love the native multimodal support, large context window, speed, cost etc. Here is the output of the program for refererence:
URLs are hallucinated. It seems to be very hard for the model to latch onto the answer by using the obfuscated vertex redirects and encoding in HTML with all the CSS distractors etc (not sure whether it is RAGing with candidate.grounding_metadata or the HTML). It did make 2 separate google calls for the 2 queries in my prompt |
@ecsplendid similar frustration here as well - a lot of issues using Gemini 2 + this new library + Google seach as a tool. Documentation pretty much doesn't exist for this repo, and the only way is to read source code (with a lot of confusion from the other few libraries, generative-ai, and vertex-ai libraries). DynamicRetrieval doesn't work for gemini-2.0 from its documentation. However, just use DynamicRetrieval requires you search in the new Google genai code base (there are some examples from the unit tests). There is no integration with langchain or other framework as well so you would have to wrap it up in compatible tools/llms to be used in langchain. |
Your prompt really confused the model. If you remove "also do a separate query for the latest news in the UK and report on that" (there is also a missing period before that), the model is able to behave correctly. Below is what I got:
|
Description of the bug:
(This is more of a gemini model bug than a genai API bug so please let me know if I should be posting this somewhere else)
I have written an example of a complex tool chain using Google search which is already working perfectly on Claude since their June Sonnett, it fails on gemini-2.0-flash-exp. Here is the code example:
You can change the prompt string to:
Actual vs expected behavior:
Supposed to use the GoogleSearchTool, it doesn't - but it does when using the trivial prompt example given
Any other information you'd like to share?
I was not able to use your built in GoogleSearch tool as there are lots of problems with it, i.e. the configuration around sensitivity only seems to work on Vertex, it insists on returning URLs which are google redirects (which would be impossible for my application), and it can't be used with other features i.e. adding audio and I think response schema validation.
It took me absolutely ages even to get a trivial example like this working, the docs and examples need a lot of work - I had to trawl through the genai source code. It was especially confusing that there are now 3 gemini APIs which I know of i.e. generativeai, vertex and genai!
Am I doing something dumb here? Surely this should work?
I hope this code example at least helps others looking for some code which actually works doing function calling with the OpenAI specification and using CSE for Google search. For reference I was forced to use this because in multi agent systems where different python processes are calling the functions, or where functions are in classes or used async - the automated function calling doesn't seem to work.
I should note that the Google built in GoogleSearch tool did work for this setting but a) I couldn't steer it sufficiently b) returned Vertex redirects c) was subject to other restrictions in the Gemini API i.e. not being able to combine with schema validation
The text was updated successfully, but these errors were encountered: