Complex tool use broken on gemini-2.0-flash-exp (with repo template) google.genai #54

ecsplendid · 2024-12-28T12:42:10Z

Description of the bug:
(This is more of a gemini model bug than a genai API bug so please let me know if I should be posting this somewhere else)

I have written an example of a complex tool chain using Google search which is already working perfectly on Claude since their June Sonnett, it fails on gemini-2.0-flash-exp. Here is the code example:

from googleapiclient.discovery import build
import ssl
import subprocess
from google import genai
from google.genai.types import (
    GenerateContentConfig,
    Tool,
    Part,
    FunctionCallingConfig,
    Content
)
import os
import asyncio
import traceback

# Replace with your API key
GEMINI_API_KEY = os.getenv("GOOGLE_GENAI_API_KEY")
GEMINI_MODEL = "gemini-2.0-flash-exp"  

def GoogleSearchingTool(query: str) -> str:
    """Performs a Google search using the Google Search API.

    Args:
        query: The search query string to execute

    Returns:
        str: The search results formatted as a string
        
    Raises:
        ValueError: If API credentials are missing
        Exception: For other API or search errors
    """
    api_key = os.environ.get("GOOGLE_SEARCH_API_KEY")
    cse_id = os.environ.get("GOOGLE_SEARCH_CSE_ID")
    
    if not api_key or not cse_id:
        raise ValueError(
            "API key or CSE ID not found. Please set the GOOGLE_SEARCH_API_KEY "
            "and GOOGLE_SEARCH_CSE_ID environment variables."
        )

    # Create service with SSL context if needed
    ssl_context = ssl._create_unverified_context()
    service = build("customsearch", "v1", developerKey=api_key)
    
    try:
        result = service.cse().list(
            q=query,
            cx=cse_id,
            num=10,
            # Add fields to optimize response
            fields="items(title,link,snippet)"
        ).execute()
        
        search_results = result.get('items', [])
        if not search_results:
            return "No results found"
            
        # Format results as string with better structure
        formatted_results = []
        for item in search_results:
            formatted_results.append(
                f"Title: {item.get('title', 'No title')}\n"
                f"Link: {item.get('link', 'No link')}\n"
                f"Summary: {item.get('snippet', 'No summary')}\n"
            )
        
        return "\n---\n".join(formatted_results)
        
    except Exception as e:
        print(f"Error performing Google search: {str(e)}")
        return f"Search error: {str(e)}"
   
async def test_gemini_with_search():
    try:
        # Initialize the client
        client = genai.Client(
            vertexai=False,
            api_key=GEMINI_API_KEY
        )

        # Define the function declaration for Google Search
        google_search_function = {
            "name": "GoogleSearchingTool",
            "description": "Performs a Google search to find relevant information. ",
            "parameters": {
                "type": "OBJECT",
                "properties": {
                    "query": {
                        "type": "STRING",
                        "description": "The search query to execute"
                    }
                },
                "required": ["query"]
            }
        }

        # Create the tool configuration
        config = GenerateContentConfig(
            temperature=0.0,
            tools=[Tool(
                function_declarations=[google_search_function]
            )],
            tool_config=FunctionCallingConfig(mode="ANY")
        )

         # example that does work --
         # """Find me the URLs for the most recent research papers about the lottery ticket hypothesis in neural networks. Search for: 'lottery ticket #hypothesis' neural networks 2023..2024 filetype:pdf site:arxiv.org also do a separate query for the latest news in the UK and report on that"""

        query = Part.from_text(
            """Analyze this merged conversation fragment:\nFragment start time: 0.19920634\nFragment end time: 223.07758\nFragment duration: 222.87837366000002\n\nSpeaker: Tim Scarfe\nText: Jeff, it's amazing to have your Ml. So I was assigned to you before. I first discovered you (5) two thousand and eighteen, Ic see when you're doing your workshop, and you're talking about (10) open ended. And and, of course, we are huge fans of Kenneth Stanley on the show we've had Joel Lemon. (15) We've had Tim Rock Ta. We've I've had loads of people on open ended ness is an absolutely (20) fascinating area to me so to have you on the show. One of the pioneers in this field me means so much to me. So thank (25) you for coming on Jeff.\n\nSpeaker: Jeff CLune\nText: My pleasure. It's great to be here and you know, you've named a lot of people that I deeply risk respect (30) and have worked with for years. So it's great to finally be here.\n\nSpeaker: Tim Scarfe\nText: So, one of your life (35) goals you've said is to create an algorithm that keeps running forever with no end. (40) You know, we think there's something fundamental about how intelligence works in the natural world. But rather than (45) trying to capture it down at the metal. You know, like there these biologically inspired folks talk about biometric (50) intelligence. You've got this very interesting approach where you kind of (55) bootstrap intelligence like in the natural world. In such a way that doesn't (60) lose important characteristics of the natural world. Can you tell us about (65) that?\n\nSpeaker: Jeff CLune\nText: Yeah. Sure. So I think one of the grand challenges of computer science, and also (70) you could think about it from a biological perspective, grand challenges of biology is to try to (75) understand how did evolution produce the explosion of amazing things that (80) we see on earth. You look out of the natural world and you see Jaguar hawk, the human (85) mind, three toad sloth, birds of paradise. All these things just kind of (90) popping up over time. This amazing men of engineering marvel, (95) diversity. It's so fascinating. And, actually, that has been the central quest (100) of my career from the beginning that? How did evolution produce this complexity and how does intelligence happen (105) and they relate. Like, how did evolution produce the intelligence that you have in your brain, which is the most (110) impressive bit so, you know, a learning machine that we know in the universe. And you (115) could try to do it and many people have tried to say, hey, We're gonna go all the way down to the lowest possible (120) level we can imagine. We're gonna create, you know, self replicating machine (125) code or even, like, self replicating artificial cells, and then we're gonna, like, hope (130) that that kind of bubbles and per up into some open ended process that eventually produces (135) like an entirely new form of interesting intelligent life, and that will teach us about the process. (140) But as Josh Tan and Bob told me when I was telling him about some of these goals of mind. He said, you know, (145) you don't have a planet sized computer to work with. So how are you gonna accomplish this within, you know, (150) the lifetime of you as a scientist or, like, us as a community. And to do that, the key gonna be (155) abstraction. You know? We don't need to recreate every single detail in (160) biology in order to study and understand the core principles. And in sometimes sense nor would we want to. You (165) know? We want to and at an abstract level, what are the key ingredients that make this process (170) work and that produces endless marvel. And so in some sense, the more (175) abstract you can make it if it still has the properties you want? Like a (180) complexity explosion? Then actually, you've done the best job because you've figured out the (185) difference between what was necessary and what was incidental. Another way to think about this is your own brain. (190) I don't believe in the blue brain project philosophy which is let's simulate every single (195) chemical and every cork in your brain in order to try to produce an intelligent machine. (200) Actually, we wanna say a lot of that chemistry. A lot of that detail is probably not necessary. It's not part of the secret (205) abstract recipe for intelligence. Let's figure out how can we abstract it (210) and still get intelligence. And so you could apply that analogy to the study of open as well. What (215) are the abstract principles that create a thing that literally you could run for billions (220) of years and would still continue to surprise and delight you?\n\n\nAudio Analysis Metadata:\n{\n  \"attention_needed\": [],\n  \"audio_quality\": {\n    \"issues\": [],\n    \"note\": \"No significant audio quality issues detected.\",\n    \"overall_quality\": 8\n  },\n  \"background_noise\": {\n    \"level\": 2,\n    \"note\": \"Very slight background hum, not disruptive to the interview.\",\n    \"types\": [\n      \"hum\"\n    ]\n  },\n  \"emotion\": {\n    \"negative_minutes\": 0.0,\n    \"neutral_minutes\": 3.71,\n    \"note\": \"No significant emotional changes detected.\",\n    \"positive_minutes\": 0.0,\n    \"variability\": 1\n  },\n  \"laughter\": {\n    \"frequency\": 1,\n    \"intensity\": 2,\n    \"note\": \"Occasional light laughter, mostly at the start of the clip around 00:00:10.\"\n  },\n  \"non_verbal_vocalizations\": {\n    \"frequency\": 1,\n    \"note\": \"Occasional throat clearing, not disruptive.\",\n    \"types\": [\n      \"throat_clearing\"\n    ]\n  },\n  \"sentiment_score\": 0.4,\n  \"social_virality\": {\n    \"factors\": [\n      \"insight\",\n      \"clarity\",\n      \"relevance\",\n      \"dynamic_conversation\"\n    ],\n    \"note\": \"Good viral potential for ML and adjacent communities due to the insightful discussion on open-endedness and its implications for AI.\",\n    \"score\": 7\n  },\n  \"speaker_engagement\": {\n    \"consistency\": 8,\n    \"level\": 7,\n    \"note\": \"Both speakers maintain a good level of engagement throughout the conversation.\"\n  },\n  \"speech_patterns\": {\n    \"average_pause_duration\": 0.4,\n    \"fluency\": 8,\n    \"pause_frequency\": 4,\n    \"speech_rate\": 6\n  },\n  \"tone_prosody\": {\n    \"description\": \"Conversational tone with consistent prosody.\",\n    \"pitch_variability\": 4,\n    \"rhythm_consistency\": 7,\n    \"stress_emphasis\": 5\n  },\n  \"vocal_characteristics\": {\n    \"breathiness\": 3,\n    \"clarity\": 9,\n    \"resonance\": 7,\n    \"vocal_strain\": 2\n  },\n  \"original_start_time\": 0.19920634,\n  \"original_end_time\": 223.07758\n}\n\nNote: The text contains time markers in the format <seconds> every 5 seconds. Use these markers for more accurate time estimation when creating summary points, references, questions, and edits.\n\nProvide a comprehensive analysis of this fragment. Your response should be in the following JSON schema:\n\n{\n  \"basic_summary\": \"A concise one-sentence summary of the fragment with no filler words at all, always use actual speaker names not 'speakers' and use name abbreviations i.e. 'Tim Scarfe'->'TS', use other abbreviations if needed. Include any crucial audio-based information (e.g., significant emotional changes, voice characteristics, notable background noises, edits, audio quality issues etc) that editors should be aware of.\",\n  \"subject_matter\": \"The primary field of inquiry (e.g., cognitive science, computer science, mathematics, etc.)\",\n  \"conciseness_score\": <integer between 1 and 10>,\n  \"technical_depth_score\": <integer between 1 and 10>,\n  \"interestingness_score\": <integer between 1 and 10>,\n  \"social_virality\": {\n    \"score\": <integer between 0 and 10>,\n    \"explanation\": \"A very brief explanation of why this content might go viral on social media in ML and adjacent communities. Use the Audio Analysis Metadata to inform your judgement based on features which were detected from the audio.\"\n  },\n  \"summary_points\": [\n    {\n      \"time\": <float between 0.19920634 and 223.07758>,\n      \"speaker\": \"<speaker name>\",\n      \"point\": \"A concise summary point\",\n      \"type\": \"summary\"\n    }\n  ],\n  \"references\": [\n    {\n      \"time\": <float between 0.19920634 and 223.07758>,\n      \"speaker\": \"<speaker name>\",\n      \"url\": \"ALWAYS USE GoogleSearchingTool TO FIND THE URL, otherwise 'Unknown' - avoid Unknown at all costs though, always try to find something using GoogleSearchingTool. The URLs in your base knowledge are almost always wrong, don't trust them. Make sure the URL is authoritative and matches the reference and author. It should always be the single most relevant URL from the search results, never more than one.\",\n      \"reference\": \"<text -- include plenty of supporting detail in structured form, including the context of the citation so we can improve later -- must match the url field>\",\n      \"author\": \"otherwise 'Unknown' - make sure the author matches the reference and the URL\",\n      \"short_description\": \"A concise one-line summary of the reference that captures its key contribution or relevance to the discussion\",\n      \"type\": \"reference\"\n    }\n  ],\n  \"questions\": [\n    {\n      \"time\": <float between 0.19920634 and 223.07758>,\n      \"speaker\": \"<speaker name>\",\n      \"question\": \"A relevant question based on the fragment\",\n      \"type\": \"The type of question (e.g., clarification, expansion, challenge, etc.)\"\n    }\n  ],\n  \"edits\": [\n    {\n      \"time\": <float between 0.19920634 and 223.07758>,\n      \"issue\": \"Reason for edit\",\n      \"type\": \"content_edit\"\n    }\n  ],\n  \"quotation_clip\": {\n    \"speaker\": \"<speaker name>\",\n    \"start_time\": <float between 0.19920634 and 223.07758>,\n    \"end_time\": <float between 0.19920634 and 223.07758>,\n    \"quotation\": \"[few word succinct explanation of quote with no filler words] - Verbatim quotation from the fragment\",\n    \"interestingness_score\": <integer between 9 and 10>\n  }\n}\n\nGuidelines:\n1. Use the provided audio analysis metadata to enhance your understanding of the fragment's content and context.\n2. In the basic_summary, include any crucial audio-based information that editors should be aware of, such as significant emotional changes, notable background noises, or important voice characteristics.\n3. Ensure that the summary_points array contains at least one item of each type (summary, key_point, and theme) from each sub-fragment (if merged) or the main fragment.\n4. The basic_summary should be a single sentence with no filler words, capturing the essence of the entire fragment, including key audio-based insights.\n5. The subject_matter should reflect the main topic discussed across all sub-fragments.\n6. All scores (conciseness_score, technical_depth_score, interestingness_score, social_virality.score) should be integers between 0 and 10, inclusive, tailored for a technical audience interested in AI, computer science, cognitive science, software engineering, mathematics, and philosophy. The concision score shouldn't just be how short the fragment is, but rather how well articulated a point is, and how high the information content is and whether it would make a good clip.\n7. For references, if none are found, provide an empty list.The references should be specific about technical things, don't reference general knowledge to a technical audience. Reflect on the references and make sure they are correct, the original text might be mistranscribed. Remember to use GoogleSearchingTool search in almost all cases unless you are certain. Include a lot of detail and URLs/papers etc. You will need to do GoogleSearchingTool searches for each reference, so think step by step and do them one by one. Make sure you always deliver a URL, avoid showing \"Unknown\" at all costs.\n7.1 References should always be source technical material rather than anything from a social network, blog, or similar. We want technical papers from arXiv (PDF link), for books prefer english Amazon links, for philisophical concepts, prefer stanford encyclopedia of philosophy. Wikipedia is sometimes OK. The links we give should always be authoritative.\n7.2 You might need to search GoogleSearchingTool with several iterations and different queries to find the right reference, the most relevant, most authoritative, most accurate source reference. If you are unsure, you should include more search results in your GoogleSearchingTool search.\n7.3 When you find a reference, always include the context of the citation so we can improve later.\n7.4 Always include the author and make sure it matches the reference and the URL.\n7.5 Include quotations i.e. when the speaker is quoting from a manuscript or paper, include the quotation in the reference.\n7.6 Include references to notable academics\n8. Generate questions which are technically relevant to the conversation. Only include substantive questions related to the technical content. If none are relevant, provide an empty list.\n9. For edits, only include parts that should obviously be edited out of the interview (e.g., off-topic remarks, technical issues, private information). If there are no such parts, provide an empty list. Ignore filler words, repetition, linguistic expression, or grammar issues. Focus only on content that should be edited. We sometimes provide important information in the audio analysis section which you should always use if present.\n10. Ensure that the 'time' field in summary_points, references, questions, and edits is a float value between 0.19920634 and 223.07758.\n11. Use the provided time markers (<seconds>) in the text to accurately estimate times for each point, reference, question, and edit.\n12. The social_virality score should predict how viral the clip would go on social media (Twitter, LinkedIn, Facebook) in the ML and adjacent communities. Provide a very brief explanation for the score. If the score is below 8 don't include the explanation field.\n13. The quotation_clip should only be extracted for the very best content, typically scoring 9 or 10 on interestingness. It should be from one person, aim for about 60 seconds in length, and should only be included roughly 10% of the time to avoid database spam. If no suitable quotation is found, omit this field entirely. Use the word-level timing information to accurately set the start_time and end_time for the quotation. Use the information from the audio analysis included to inform your judgement of the best quotations.\n14. Consider the audio analysis metadata when determining the interestingness, social virality, and overall importance of the content.\n15. When including audio-based information in the basic_summary, be very concise and only mention truly significant aspects that would impact editing decisions.\n16. You may use GoogleSearchingTool to fill in any blanks in your knowledge. When you use GoogleSearchingTool, use the additional information we have in our context to construct a relevant query.\n17. Assume that the interview took place in 2024."""
        )

        # Make the API call with a prompt that requires search
        response = await client.aio.models.generate_content(
            model=GEMINI_MODEL,
            contents=[Content(parts=[
                query
            ])],
            config=config
        )

        # Process the response
        if response.candidates:
            candidate = response.candidates[0]
            if candidate.content and candidate.content.parts:
                for part in candidate.content.parts:
                    if part.function_call:
                        print("\nFunction call detected:")
                        print(f"Name: {part.function_call.name}")
                        print(f"Args: {part.function_call.args}")
                        
                        # Execute search
                        search_results = GoogleSearchingTool(**part.function_call.args)

                        print(f"Search results: {search_results}")
                        
                        # Get final response with search results
                        final_response = await client.aio.models.generate_content(
                            model=GEMINI_MODEL,
                            contents=[Content(parts=[Part.from_text(search_results), query])],
                            config=GenerateContentConfig(
                                temperature=0.0,
                                response_schema=schema
                                ),
                            
                        )
                        
                        if final_response.text:
                            print("\nFinal response with search results:")
                            print(final_response.text)
                    elif part.text:
                        print("\nDirect text response:")
                        print(part.text)

    except Exception as e:
        print(f"Error during test: {str(e)}")
        traceback.print_exc()
        raise

schema = {
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "required": [
    "basic_summary",
    "subject_matter",
    "conciseness_score",
    "technical_depth_score",
    "interestingness_score",
    "social_virality",
    "summary_points",
    "references",
    "questions",
    "edits",
    "quotation_clip"
  ],
  "properties": {
    "basic_summary": {
      "type": "string",
      "description": "A brief overview of the content"
    },
    "subject_matter": {
      "type": "string",
      "description": "Main topics covered in the content"
    },
    "conciseness_score": {
      "type": "integer",
      "minimum": 0,
      "maximum": 10,
      "description": "Rating of how concise the content is"
    },
    "technical_depth_score": {
      "type": "integer",
      "minimum": 0,
      "maximum": 10,
      "description": "Rating of the technical depth of the content"
    },
    "interestingness_score": {
      "type": "integer",
      "minimum": 0,
      "maximum": 10,
      "description": "Rating of how interesting the content is"
    },
    "social_virality": {
      "type": "object",
      "required": ["score", "explanation"],
      "properties": {
        "score": {
          "type": "integer",
          "minimum": 0,
          "maximum": 10,
          "description": "Rating of viral potential"
        },
        "explanation": {
          "type": "string",
          "description": "Explanation of the viral potential"
        }
      }
    },
    "summary_points": {
      "type": "array",
      "items": {
        "type": "object",
        "required": ["time", "speaker", "point", "type"],
        "properties": {
          "time": {
            "type": "number",
            "description": "Timestamp in seconds"
          },
          "speaker": {
            "type": "string",
            "description": "Name of the speaker"
          },
          "point": {
            "type": "string",
            "description": "Content of the summary point"
          },
          "type": {
            "type": "string",
            "enum": ["summary"],
            "description": "Type of the point"
          }
        }
      }
    },
    "references": {
      "type": "array",
      "items": {
        "type": "object",
        "required": [
          "time",
          "speaker",
          "url",
          "reference",
          "author",
          "short_description",
          "type"
        ],
        "properties": {
          "time": {
            "type": "number",
            "description": "Timestamp in seconds"
          },
          "speaker": {
            "type": "string",
            "description": "Name of the speaker"
          },
          "url": {
            "type": "string",
            "format": "uri",
            "description": "URL of the reference"
          },
          "reference": {
            "type": "string",
            "description": "Detailed reference information"
          },
          "author": {
            "type": "string",
            "description": "Author of the referenced work"
          },
          "short_description": {
            "type": "string",
            "description": "Brief description of the reference"
          },
          "type": {
            "type": "string",
            "enum": ["reference"],
            "description": "Type of the item"
          }
        }
      }
    },
    "questions": {
      "type": "array",
      "items": {
        "type": "object",
        "required": ["time", "speaker", "question", "type"],
        "properties": {
          "time": {
            "type": "number",
            "description": "Timestamp in seconds"
          },
          "speaker": {
            "type": "string",
            "description": "Name of the speaker"
          },
          "question": {
            "type": "string",
            "description": "Content of the question"
          },
          "type": {
            "type": "string",
            "enum": ["clarification", "expansion"],
            "description": "Type of question"
          }
        }
      }
    },
    "edits": {
      "type": "array",
      "description": "Array of edit records"
    },
    "quotation_clip": {
      "type": "object",
      "required": [
        "speaker",
        "start_time",
        "end_time",
        "quotation",
        "interestingness_score"
      ],
      "properties": {
        "speaker": {
          "type": "string",
          "description": "Name of the speaker"
        },
        "start_time": {
          "type": "number",
          "description": "Start timestamp in seconds"
        },
        "end_time": {
          "type": "number",
          "description": "End timestamp in seconds"
        },
        "quotation": {
          "type": "string",
          "description": "Content of the quotation"
        },
        "interestingness_score": {
          "type": "integer",
          "minimum": 0,
          "maximum": 10,
          "description": "Rating of how interesting the quotation is"
        }
      }
    }
  }
}


if __name__ == "__main__":
    asyncio.run(test_gemini_with_search())

You can change the prompt string to:

Find me the URLs for the most recent research papers about the lottery ticket hypothesis in neural networks. 
Search for: 'lottery ticket #hypothesis' neural networks 2023..2024 filetype:pdf site:arxiv.org also 
do a separate query for the latest news in the UK and report on that

Actual vs expected behavior:
Supposed to use the GoogleSearchTool, it doesn't - but it does when using the trivial prompt example given

Any other information you'd like to share?
I was not able to use your built in GoogleSearch tool as there are lots of problems with it, i.e. the configuration around sensitivity only seems to work on Vertex, it insists on returning URLs which are google redirects (which would be impossible for my application), and it can't be used with other features i.e. adding audio and I think response schema validation.

It took me absolutely ages even to get a trivial example like this working, the docs and examples need a lot of work - I had to trawl through the genai source code. It was especially confusing that there are now 3 gemini APIs which I know of i.e. generativeai, vertex and genai!

Am I doing something dumb here? Surely this should work?

I hope this code example at least helps others looking for some code which actually works doing function calling with the OpenAI specification and using CSE for Google search. For reference I was forced to use this because in multi agent systems where different python processes are calling the functions, or where functions are in classes or used async - the automated function calling doesn't seem to work.

I should note that the Google built in GoogleSearch tool did work for this setting but a) I couldn't steer it sufficiently b) returned Vertex redirects c) was subject to other restrictions in the Gemini API i.e. not being able to combine with schema validation

The text was updated successfully, but these errors were encountered:

ecsplendid · 2024-12-28T15:04:40Z

I decided to go back to basics and do a super simple "canonical" example of GoogleSearch just using the built-in functionality - it doesn't work at all.

from googleapiclient.discovery import build
import ssl
import subprocess
from google import genai
from google.genai.types import (
    GenerateContentConfig,
    Tool,
    Part,
    FunctionCallingConfig,
    Content,
    GoogleSearch,
    DynamicRetrievalConfig,
    GoogleSearchRetrieval
)
import os
import asyncio
import traceback

# Replace with your API key
GEMINI_API_KEY = os.getenv("GOOGLE_GENAI_API_KEY")
GEMINI_MODEL = "gemini-2.0-flash-exp"  

async def test_gemini_with_search():
    try:
        # Initialize the client
        client = genai.Client(
            vertexai=False,
            api_key=GEMINI_API_KEY
        )

        # Create the tool configuration
        config = GenerateContentConfig(
            temperature=0.0,
            tools=[Tool(google_search=GoogleSearch())],
            tool_config=FunctionCallingConfig(mode="ANY")
        )

        query = Part.from_text(
            """Find me the URLs for the most recent research papers about the lottery ticket hypothesis in neural networks. Search for: 'lottery ticket hypothesis' neural networks 2023..2024 filetype:pdf site:arxiv.org also do a separate query for the latest news in the UK and report on that"""
        )

        # Make the API call with a prompt that requires search
        response = await client.aio.models.generate_content(
            model=GEMINI_MODEL,
            contents=[Content(parts=[
                query
            ])],
            config=config,

        )

        print("\nResponse received. Processing parts...")

        # Process the response
        if response.candidates:
            candidate = response.candidates[0]
            
            # Check grounding metadata for search results
            if hasattr(candidate, 'grounding_metadata') and candidate.grounding_metadata:
                print("\nGrounding Metadata (Search Results):")
                print(candidate.grounding_metadata)
                
               
            
            # Print the actual response text
            if candidate.content and candidate.content.parts:
                for part in candidate.content.parts:
                    if part.text:
                        print("\nText Response:")
                        print(part.text)
        else:
            print("\nNo candidates in response")
            print(f"Full response: {response}")

    except Exception as e:
        print(f"Error during test: {str(e)}")
        traceback.print_exc()
        raise

if __name__ == "__main__":
    asyncio.run(test_gemini_with_search())

We can see the search took place from the grounding metadata, but the model chose to ignore it and hallucinate a bunch of URLs.

I tried to use the DynamicRetrieval feature (no documentation)

        config = GenerateContentConfig(
            temperature=0.0,
            tools=[Tool(
                google_search=GoogleSearch(),
                retrieval=DynamicRetrievalConfig(
                    min_score=0.8,  # High confidence threshold
                    max_results=5,  # Number of search results to use
                    include_citations=True  # Include source citations
                )
            )],
            tool_config=FunctionCallingConfig(mode="ANY")

It looks like it's just not wired up in the API or broken somehow

I have to conclude that gemini-2.0-flash-exp is not yet a strong enough model for doing tool use, real shame. Please point out if I have made a stupid mistake somewhere. Claude is amazing at all of this stuff. What makes it more of a shame is that the model is really good in other ways, I have been testing it my pipeline and it's amazing for doing many other things, I love the native multimodal support, large context window, speed, cost etc.

Here is the output of the program for refererence:


Response received. Processing parts...

Grounding Metadata (Search Results):
grounding_chunks=[GroundingChunk(retrieved_context=None, web=GroundingChunkWeb(title='arxiv.org', uri='https://vertexaisearch.cloud.google.com/grounding-api-redirect/AYygrcQuddJwbNCmHR5LE1dchMAL00FjdM7dq-hSduou2Sl8ueLVdzqjgygcGVSq1_f-xINwQ86GSDcTJha9PbUrW4jkxa3DX_lW6kfxRD-5xuZkligEpJqxmmT_y8WV')), GroundingChunk(retrieved_context=None, web=GroundingChunkWeb(title='arxiv.org', uri='https://vertexaisearch.cloud.google.com/grounding-api-redirect/AYygrcTBiBHn3gMmXmQ5dqB1tU-WOU4-7Zz9UOb0vjQA9gldrmti7q2Gt-IDKpME0gSyG1bhuBOmmiQZEfR53nQrvUhaw0q0CEnmfTJ8Ad7AMLZXYxpRoWj1P3ItAx9J')), GroundingChunk(retrieved_context=None, web=GroundingChunkWeb(title='independent.co.uk', uri='https://vertexaisearch.cloud.google.com/grounding-api-redirect/AYygrcSHURfaik962DF2wPhFFUQGWXXzuEs6sNNYrDmGIaUED81fed5qpuKI48wNt-uln7SzdXB6ye7hBzaxfENEcYAviurgu6477aQQgqK2mQXY8mNzccWLqtENzA==')), GroundingChunk(retrieved_context=None, web=GroundingChunkWeb(title='bbc.co.uk', uri='https://vertexaisearch.cloud.google.com/grounding-api-redirect/AYygrcTtOtlX366QfljsniDDbzNCkJID3ZwOoUSS1iq-GsTJkKRWblf_IXgAUyrdqdesTI80q2NsSF-xLrn5Nl2xhjGVeHRvq4JDv3HU3rkkunOexSuKNPbyHHtP')), GroundingChunk(retrieved_context=None, web=GroundingChunkWeb(title='bbc.com', uri='https://vertexaisearch.cloud.google.com/grounding-api-redirect/AYygrcR-6eswbfY_MbCj9-NeLkXevkvOtlyk0bPH1MxRrP51HWaDlPp7s8yDft-77f-g-FTITqM6hZlC0raGLMX6DPGzTWf9WWsgTJckD6R4zPsPX7JPa_mPKw=='))] grounding_supports=[GroundingSupport(confidence_scores=[0.85341215], grounding_chunk_indices=[0], segment=Segment(end_index=608, part_index=None, start_index=415, text='*   This paper provides a comprehensive survey of the Lottery Ticket Hypothesis, examining previous research, discussing open issues, and suggesting potential directions for future exploration.')), GroundingSupport(confidence_scores=[0.9520402], grounding_chunk_indices=[0], segment=Segment(end_index=727, part_index=None, start_index=609, text='It also highlights the lack of open-source frameworks and consensual experimental settings as challenges in the field.')), GroundingSupport(confidence_scores=[0.6774473], grounding_chunk_indices=[1], segment=Segment(end_index=819, part_index=None, start_index=729, text='2.  **Gaining the Sparse Rewards by Exploring Lottery Tickets in Spiking Neural Networks**')), GroundingSupport(confidence_scores=[0.7534214], grounding_chunk_indices=[1], segment=Segment(end_index=1090, part_index=None, start_index=938, text='*   This paper explores the application of the Lottery Ticket Hypothesis to Spiking Neural Networks (SNNs), which are known for their energy efficiency.')), GroundingSupport(confidence_scores=[0.97518075], grounding_chunk_indices=[1], segment=Segment(end_index=1256, part_index=None, start_index=1091, text='It investigates the properties of "spiking-based lottery tickets" and proposes a sparse algorithm for spiking transformer structures to achieve multi-level sparsity.')), GroundingSupport(confidence_scores=[0.9793397], grounding_chunk_indices=[2], segment=Segment(end_index=1509, part_index=None, start_index=1372, text='*   **Travel Disruptions:** Fog has caused significant travel disruptions, with numerous flights being delayed or canceled across the UK.')), GroundingSupport(confidence_scores=[0.98881626], grounding_chunk_indices=[2], segment=Segment(end_index=1688, part_index=None, start_index=1564, text='*   **Weather:** The fog is expected to turn into snow and heavy rain, and there are flood warnings issued for the New Year.')), GroundingSupport(confidence_scores=[0.9451483, 0.97554725], grounding_chunk_indices=[3, 4], segment=Segment(end_index=1771, part_index=None, start_index=1689, text='*   **Crime:** A woman has been charged with the murder of a man on Christmas Day.')), GroundingSupport(confidence_scores=[0.7764542], grounding_chunk_indices=[3], segment=Segment(end_index=1852, part_index=None, start_index=1772, text='Additionally, a man has appeared in court charged with the murders of two women.')), GroundingSupport(confidence_scores=[0.9944389, 0.9938976], grounding_chunk_indices=[4, 3], segment=Segment(end_index=1981, part_index=None, start_index=1877, text='*   The actress Olivia Hussey, known for her role in Romeo and Juliet, has passed away at the age of 73.')), GroundingSupport(confidence_scores=[0.62604964], grounding_chunk_indices=[4], segment=Segment(end_index=2052, part_index=None, start_index=1986, text='*   Dame Judi Dench has revealed an apple tribute to Maggie Smith.')), GroundingSupport(confidence_scores=[0.9837083], grounding_chunk_indices=[3], segment=Segment(end_index=2117, part_index=None, start_index=2057, text='*   1,329 tiny snails have been released on a remote island.')), GroundingSupport(confidence_scores=[0.98902404], grounding_chunk_indices=[4], segment=Segment(end_index=2186, part_index=None, start_index=2122, text='*   Nessa will read the old Shipping Forecast for its centenary.'))] retrieval_metadata=None retrieval_queries=None search_entry_point=SearchEntryPoint(rendered_content='<style>\n.container {\n  align-items: center;\n  border-radius: 8px;\n  display: flex;\n  font-family: Google Sans, Roboto, sans-serif;\n  font-size: 14px;\n  line-height: 20px;\n  padding: 8px 12px;\n}\n.chip {\n  display: inline-block;\n  border: solid 1px;\n  border-radius: 16px;\n  min-width: 14px;\n  padding: 5px 16px;\n  text-align: center;\n  user-select: none;\n  margin: 0 8px;\n  -webkit-tap-highlight-color: transparent;\n}\n.carousel {\n  overflow: auto;\n  scrollbar-width: none;\n  white-space: nowrap;\n  margin-right: -12px;\n}\n.headline {\n  display: flex;\n  margin-right: 4px;\n}\n.gradient-container {\n  position: relative;\n}\n.gradient {\n  position: absolute;\n  transform: translate(3px, -9px);\n  height: 36px;\n  width: 9px;\n}\n@media (prefers-color-scheme: light) {\n  .container {\n    background-color: #fafafa;\n    box-shadow: 0 0 0 1px #0000000f;\n  }\n  .headline-label {\n    color: #1f1f1f;\n  }\n  .chip {\n    background-color: #ffffff;\n    border-color: #d2d2d2;\n    color: #5e5e5e;\n    text-decoration: none;\n  }\n  .chip:hover {\n    background-color: #f2f2f2;\n  }\n  .chip:focus {\n    background-color: #f2f2f2;\n  }\n  .chip:active {\n    background-color: #d8d8d8;\n    border-color: #b6b6b6;\n  }\n  .logo-dark {\n    display: none;\n  }\n  .gradient {\n    background: linear-gradient(90deg, #fafafa 15%, #fafafa00 100%);\n  }\n}\n@media (prefers-color-scheme: dark) {\n  .container {\n    background-color: #1f1f1f;\n    box-shadow: 0 0 0 1px #ffffff26;\n  }\n  .headline-label {\n    color: #fff;\n  }\n  .chip {\n    background-color: #2c2c2c;\n    border-color: #3c4043;\n    color: #fff;\n    text-decoration: none;\n  }\n  .chip:hover {\n    background-color: #353536;\n  }\n  .chip:focus {\n    background-color: #353536;\n  }\n  .chip:active {\n    background-color: #464849;\n    border-color: #53575b;\n  }\n  .logo-light {\n    display: none;\n  }\n  .gradient {\n    background: linear-gradient(90deg, #1f1f1f 15%, #1f1f1f00 100%);\n  }\n}\n</style>\n<div class="container">\n  <div class="headline">\n    <svg class="logo-light" width="18" height="18" viewBox="9 9 35 35" fill="none" xmlns="http://www.w3.org/2000/svg">\n      <path fill-rule="evenodd" clip-rule="evenodd" d="M42.8622 27.0064C42.8622 25.7839 42.7525 24.6084 42.5487 23.4799H26.3109V30.1568H35.5897C35.1821 32.3041 33.9596 34.1222 32.1258 35.3448V39.6864H37.7213C40.9814 36.677 42.8622 32.2571 42.8622 27.0064V27.0064Z" fill="#4285F4"/>\n      <path fill-rule="evenodd" clip-rule="evenodd" d="M26.3109 43.8555C30.9659 43.8555 34.8687 42.3195 37.7213 39.6863L32.1258 35.3447C30.5898 36.3792 28.6306 37.0061 26.3109 37.0061C21.8282 37.0061 18.0195 33.9811 16.6559 29.906H10.9194V34.3573C13.7563 39.9841 19.5712 43.8555 26.3109 43.8555V43.8555Z" fill="#34A853"/>\n      <path fill-rule="evenodd" clip-rule="evenodd" d="M16.6559 29.8904C16.3111 28.8559 16.1074 27.7588 16.1074 26.6146C16.1074 25.4704 16.3111 24.3733 16.6559 23.3388V18.8875H10.9194C9.74388 21.2072 9.06992 23.8247 9.06992 26.6146C9.06992 29.4045 9.74388 32.022 10.9194 34.3417L15.3864 30.8621L16.6559 29.8904V29.8904Z" fill="#FBBC05"/>\n      <path fill-rule="evenodd" clip-rule="evenodd" d="M26.3109 16.2386C28.85 16.2386 31.107 17.1164 32.9095 18.8091L37.8466 13.8719C34.853 11.082 30.9659 9.3736 26.3109 9.3736C19.5712 9.3736 13.7563 13.245 10.9194 18.8875L16.6559 23.3388C18.0195 19.2636 21.8282 16.2386 26.3109 16.2386V16.2386Z" fill="#EA4335"/>\n    </svg>\n    <svg class="logo-dark" width="18" height="18" viewBox="0 0 48 48" xmlns="http://www.w3.org/2000/svg">\n      <circle cx="24" cy="23" fill="#FFF" r="22"/>\n      <path d="M33.76 34.26c2.75-2.56 4.49-6.37 4.49-11.26 0-.89-.08-1.84-.29-3H24.01v5.99h8.03c-.4 2.02-1.5 3.56-3.07 4.56v.75l3.91 2.97h.88z" fill="#4285F4"/>\n      <path d="M15.58 25.77A8.845 8.845 0 0 0 24 31.86c1.92 0 3.62-.46 4.97-1.31l4.79 3.71C31.14 36.7 27.65 38 24 38c-5.93 0-11.01-3.4-13.45-8.36l.17-1.01 4.06-2.85h.8z" fill="#34A853"/>\n      <path d="M15.59 20.21a8.864 8.864 0 0 0 0 5.58l-5.03 3.86c-.98-2-1.53-4.25-1.53-6.64 0-2.39.55-4.64 1.53-6.64l1-.22 3.81 2.98.22 1.08z" fill="#FBBC05"/>\n      <path d="M24 14.14c2.11 0 4.02.75 5.52 1.98l4.36-4.36C31.22 9.43 27.81 8 24 8c-5.93 0-11.01 3.4-13.45 8.36l5.03 3.85A8.86 8.86 0 0 1 24 14.14z" fill="#EA4335"/>\n    </svg>\n    <div class="gradient-container"><div class="gradient"></div></div>\n  </div>\n  <div class="carousel">\n    <a class="chip" href="https://vertexaisearch.cloud.google.com/grounding-api-redirect/AYygrcTvZ-25fq2wSSdUXnfYfAkvVE9ArpTx8Sk6_Z5I3OqpBFNXUmIuZhAUnxXEmenOwHJ41ep8sCfrg8GnswD6H7AZCJgosPqrA97ergy0j0L5Gzwool6M4RVe3u8E4zTVALG7zOLBikgbFOJ9Vov2_3h8MU2ooh8L6hSUhj-GPdVaVvL-8Hr0j4kwXLJ7eLYVNDoSeNz4JBWz3Q==">latest news in the UK</a>\n    <a class="chip" href="https://vertexaisearch.cloud.google.com/grounding-api-redirect/AYygrcRV-tZpqoA5nyO_5Q8PtFS-WKINW8bmPPHMDLNl-V7X7sb5Tin1pNwDaVm9ReAoqtSNhnq4wEg2m1DJ22TnQzyaK0566CgP4eRgjNTST_EE1AYBus2tFg7-C1kXlZrbMMH2y6_005CDxJK_tIwHKmWbqvN4GI-IyxWxo6N0WzedrX6xuo-PpuC7B2sX5NI4GM5KeAmReVDAMvawXVW3rEZQI6pmsmlxOrEFbh_KPsBLBw2u1IeGdEcUsY-_kqvj2BQGcYgXuY_ziqVAOMrtP2Ok6mPsM8xymcoByg==">&#39;lottery ticket hypothesis&#39; neural networks 2023..2024 filetype:pdf site:arxiv.org</a>\n  </div>\n</div>\n', sdk_blob=None) web_search_queries=["'lottery ticket hypothesis' neural networks 2023..2024 filetype:pdf site:arxiv.org", 'latest news in the UK']

Text Response:
Okay, here's the information you requested:

**Research Papers on Lottery Ticket Hypothesis (2023-2024):**

Based on the search results, here are the URLs for recent research papers on the Lottery Ticket Hypothesis in neural networks from arXiv.org:

1.  **A Survey of Lottery Ticket Hypothesis**
    *   URL: [https://arxiv.org/abs/2403.08279](https://arxiv.org/abs/2403.08279)
    *   Published: (2024-03-12)
    *   This paper provides a comprehensive survey of the Lottery Ticket Hypothesis, examining previous research, discussing open issues, and suggesting potential directions for future exploration. It also highlights the lack of open-source frameworks and consensual experimental settings as challenges in the field.

2.  **Gaining the Sparse Rewards by Exploring Lottery Tickets in Spiking Neural Networks**
    *   URL: [https://arxiv.org/abs/2409.13989](https://arxiv.org/abs/2409.13989)
    *   Published: (2024-09-20)
    *   This paper explores the application of the Lottery Ticket Hypothesis to Spiking Neural Networks (SNNs), which are known for their energy efficiency. It investigates the properties of "spiking-based lottery tickets" and proposes a sparse algorithm for spiking transformer structures to achieve multi-level sparsity.

**Latest News in the UK:**

Here's a summary of the latest news in the UK, based on the provided search results:

*   **Travel Disruptions:** Fog has caused significant travel disruptions, with numerous flights being delayed or canceled across the UK. There are also reports of potential rail disruptions.
*   **Weather:** The fog is expected to turn into snow and heavy rain, and there are flood warnings issued for the New Year.
*   **Crime:** A woman has been charged with the murder of a man on Christmas Day. Additionally, a man has appeared in court charged with the murders of two women.
*   **Other News:**
    *   The actress Olivia Hussey, known for her role in Romeo and Juliet, has passed away at the age of 73.
    *   Dame Judi Dench has revealed an apple tribute to Maggie Smith.
    *   1,329 tiny snails have been released on a remote island.
    *   Nessa will read the old Shipping Forecast for its centenary.

Please note that news is constantly evolving, so for the very latest updates, it's best to refer to the live news sources directly.

URLs are hallucinated. It seems to be very hard for the model to latch onto the answer by using the obfuscated vertex redirects and encoding in HTML with all the CSS distractors etc (not sure whether it is RAGing with candidate.grounding_metadata or the HTML). It did make 2 separate google calls for the 2 queries in my prompt

oliverhu · 2025-01-03T00:20:20Z

@ecsplendid similar frustration here as well - a lot of issues using Gemini 2 + this new library + Google seach as a tool. Documentation pretty much doesn't exist for this repo, and the only way is to read source code (with a lot of confusion from the other few libraries, generative-ai, and vertex-ai libraries). DynamicRetrieval doesn't work for gemini-2.0 from its documentation. However, just use DynamicRetrieval requires you search in the new Google genai code base (there are some examples from the unit tests). There is no integration with langchain or other framework as well so you would have to wrap it up in compatible tools/llms to be used in langchain.

windmaple · 2025-01-09T02:34:11Z

URLs are hallucinated. It seems to be very hard for the model to latch onto the answer by using the obfuscated vertex redirects and encoding in HTML with all the CSS distractors etc (not sure whether it is RAGing with candidate.grounding_metadata or the HTML). It did make 2 separate google calls for the 2 queries in my prompt

Your prompt really confused the model. If you remove "also do a separate query for the latest news in the UK and report on that" (there is also a missing period before that), the model is able to behave correctly. Below is what I got:

Response received. Processing parts...

Grounding Metadata (Search Results):
[GroundingChunk(retrieved_context=None, web=GroundingChunkWeb(title='arxiv.org', uri='https://vertexaisearch.cloud.google.com/grounding-api-redirect/AYygrcSHbs5IXGeZiFnEL1CY_TAW946S81DTK6MSvnZoiIzUffZBbNEkklaG_ts7rwnYhtSsAGfmMQDUTLuoB5tm0Zb869AuOlYtS_qsxkOE47Z383-p2ZaPItedYTrS')), GroundingChunk(retrieved_context=None, web=GroundingChunkWeb(title='arxiv.org', uri='https://vertexaisearch.cloud.google.com/grounding-api-redirect/AYygrcT0UhJ_YBA4hU6ZqAsdAcTGhCANnt044nV2nA5hCxMbCf8nHj0smzkKpidXKvraYzI5QxE1Y9fiythdUGvhfpLcVZmge7XQ4YvC_Vx2MUYZHSOUgXg2wIhYQXFW'))]

Text Response:
Here are the URLs for recent research papers on the lottery ticket hypothesis in neural networks, as found on arXiv.org:

1.  **A Survey of Lottery Ticket Hypothesis:** This paper provides a comprehensive overview of the lottery ticket hypothesis, examining various aspects of the research, including different perspectives, open issues, and potential directions for future exploration. It also discusses the challenges in the field, such as the lack of open-source frameworks and consensual experimental settings.
    *   **URL:** [https://vertexaisearch.cloud.google.com/grounding-api-redirect/AYygrcSHbs5IXGeZiFnEL1CY_TAW946S81DTK6MSvnZoiIzUffZBbNEkklaG_ts7rwnYhtSsAGfmMQDUTLuoB5tm0Zb869AuOlYtS_qsxkOE47Z383-p2ZaPItedYTrS](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AYygrcSHbs5IXGeZiFnEL1CY_TAW946S81DTK6MSvnZoiIzUffZBbNEkklaG_ts7rwnYhtSsAGfmMQDUTLuoB5tm0Zb869AuOlYtS_qsxkOE47Z383-p2ZaPItedYTrS)

2.  **On the Sparsity of the Strong Lottery Ticket Hypothesis:** This paper delves into the theoretical aspects of the strong lottery ticket hypothesis, focusing on the existence of subnetworks that can achieve impressive performance without any training. It also discusses the limitations of current research and the potential impact of this work.
    *   **URL:** [https://vertexaisearch.cloud.google.com/grounding-api-redirect/AYygrcQVKmZ3CoMWkSssZiskSSha5zXIZ7bM8g8E1TFP7v9Dmq6MqJGg2X4htFRDmiEVe56JO9cOsn39qA41VgUzAeLyyEInN_dp-7MzJYqyH08tsva00r9C6PTRwTo1](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AYygrcQVKmZ3CoMWkSssZiskSSha5zXIZ7bM8g8E1TFP7v9Dmq6MqJGg2X4htFRDmiEVe56JO9cOsn39qA41VgUzAeLyyEInN_dp-7MzJYqyH08tsva00r9C6PTRwTo1)

3.  **WORM: Finding Lottery Tickets by Truncating Gradients of Unimportant Neurons:** This paper introduces a new method called WORM, which improves the efficiency of finding lottery tickets by truncating the gradients of unimportant neurons. It demonstrates that WORM achieves faster ticket identification and improves the robustness of pruned models.
    *   **URL:** [https://vertexaisearch.cloud.google.com/grounding-api-redirect/AYygrcT0UhJ_YBA4hU6ZqAsdAcTGhCANnt044nV2nA5hCxMbCf8nHj0smzkKpidXKvraYzI5QxE1Y9fiythdUGvhfpLcVZmge7XQ4YvC_Vx2MUYZHSOUgXg2wIhYQXFW](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AYygrcT0UhJ_YBA4hU6ZqAsdAcTGhCANnt044nV2nA5hCxMbCf8nHj0smzkKpidXKvraYzI5QxE1Y9fiythdUGvhfpLcVZmge7XQ4YvC_Vx2MUYZHSOUgXg2wIhYQXFW)

These papers cover various aspects of the lottery ticket hypothesis, from theoretical foundations to practical applications and new methods for finding winning tickets.

ecsplendid added priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Dec 28, 2024

ecsplendid mentioned this issue Dec 28, 2024

Complex tool use broken on gemini-2.0-flash-exp (with repo template) google-gemini/generative-ai-python#660

Closed

sasha-gitg added type: question Request for information or clarification. Not an issue. api: gemini-api and removed type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Jan 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Complex tool use broken on gemini-2.0-flash-exp (with repo template) google.genai #54

Complex tool use broken on gemini-2.0-flash-exp (with repo template) google.genai #54

ecsplendid commented Dec 28, 2024 •

edited

Loading

ecsplendid commented Dec 28, 2024 •

edited

Loading

oliverhu commented Jan 3, 2025

windmaple commented Jan 9, 2025 •

edited

Loading

Complex tool use broken on gemini-2.0-flash-exp (with repo template) google.genai #54

Complex tool use broken on gemini-2.0-flash-exp (with repo template) google.genai #54

Comments

ecsplendid commented Dec 28, 2024 • edited Loading

ecsplendid commented Dec 28, 2024 • edited Loading

oliverhu commented Jan 3, 2025

windmaple commented Jan 9, 2025 • edited Loading

ecsplendid commented Dec 28, 2024 •

edited

Loading

ecsplendid commented Dec 28, 2024 •

edited

Loading

windmaple commented Jan 9, 2025 •

edited

Loading