Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory context limit exceeded #957

Closed
6 tasks
dlaliberte opened this issue Feb 6, 2024 · 3 comments · Fixed by #977
Closed
6 tasks

Memory context limit exceeded #957

dlaliberte opened this issue Feb 6, 2024 · 3 comments · Fixed by #977
Assignees
Labels
bug Something isn't working

Comments

@dlaliberte
Copy link

dlaliberte commented Feb 6, 2024

Describe the bug
Several people have experienced the error with memory context limit being exceeded, especially with local LLMs.

Please describe your setup

  • MemGPT version
    • 0.3.0
  • How did you install memgpt?
    • git clone
  • Describe your setup
    • What's your OS: Windows 11.
    • How are you running memgpt? pwsh

Screenshots
Similar to this:

  File "C:\Users\danie\memgpt\memgpt\local_llm\chat_completion_proxy.py", line 155, in get_chat_completion
    result, usage = get_koboldcpp_completion(endpoint, auth_type, auth_key, prompt, context_window, grammar=grammar)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\danie\memgpt\memgpt\local_llm\koboldcpp\api.py", line 17, in get_koboldcpp_completion
    raise Exception(f"Request exceeds maximum context length ({prompt_tokens} > {context_window} tokens)")
Exception: Request exceeds maximum context length (8465 > 8192 tokens)

Additional context
It turns out there are about 3 bugs that need to be fixed to fully resolve this.

Here is my chat stream, copied from a thread in the discord Support channel.

The error about exceeding the maximum context length occurs in the process of trying to summarize the messages after it has been determined that an overflow will occur.

Attempting to summarize 18 messages [1:19] of 24
Using model koboldcpp, endpoint: http://172.18.128.1:5002
unsetting function_call because functions is None
An exception occurred when running agent.step():
...
  File "C:\Users\danie\memgpt\memgpt\local_llm\koboldcpp\api.py", line 17, in get_koboldcpp_completion
    raise Exception(f"Request exceeds maximum context length ({prompt_tokens} > {context_window} tokens)")
Exception: Request exceeds maximum context length (4219 > 4096 tokens)

So, the summarizer needs to use a different way of using the LLM to do the summarizing that avoids this problem.
Why is it trying to summarize 75% (==0.75) of all messages, not counting the system message? That seems like way too severe, and a likely cause of sending a longer message than intended, just to get some text summarized.

Tracking this down further, in chat_completion_proxy.py, the get_chat_completion function is called with function_call parameter set to None, since functions is None. And this raises a ValueError.

    if function_call != "auto":
        printd(f"[DEBUG] Raising ValueError since {function_call} is not 'auto'")
        raise ValueError(f"function_call == {function_call} not supported (auto only)")

I commented out that if check, since it doesn't seem to be important what the function_call was in that function. Then I got a summary from the LLM!

But then I immedately got another exception about exceeding the maximum context length. So ... I'll keep tracking this down. I suspect there are more of these stumbling blocks on the way back to using the summary.

The next exception is occurring in persistence_manager.py, trim_messages method:

    def trim_messages(self, num):
        printd(f"InMemoryStateManager.trim_messages {num} messages of length {len(self.messages)}")
        self.messages = [self.messages[0]] + self.messages[num:]

The debug output indicates that len(self.messages) is 0!
But Agent._trim_messages is called when the agent.messages has a length of 27.
It looks like the persistence_manager was not given the messages from the agent.

Taking a wild guess about how to fix this problem with the persistence_manager, I am tempted to call the init method with the agent, which at least sets the self.messages to the agent.messages.

Well, that sorta worked! At least it avoided the exception in the persistence_manager trim_messages method.

But then the resulting summary maybe wasn't small enough, and the system got into an apparent infinite loop summarizing repeatedly, not making enough progress. I had reduced the amount to summarize from 75% down to just 25%, so maybe that is not enough, especially for the artificially small maximum length of only 4096. I'll try 50%.

Looking good! This may be fixed. I have not had any more memory context exceeded errors.


Local LLM details

If you are trying to run MemGPT with local LLMs, please provide the following information:

  • The exact model you're trying to use: TheBloke/dolphin-2.2.1-mistral-7B-GGUF/dolphin-2.2.1-mistral-7b.Q5_K_S.gguf
  • The local LLM backend you are using: Kobold
  • Your hardware for the local LLM backend: local computer with GPU
@sarahwooders sarahwooders added the bug Something isn't working label Feb 6, 2024
@cpacker cpacker added bug Something isn't working and removed bug Something isn't working labels Feb 6, 2024
@dlaliberte
Copy link
Author

I can create a PR with my fix, but my changes feel more like hacks, since I didn't figure out why the code might be the way it is, so there might be a better fix.

@cpacker cpacker moved this from To triage to Ready in 🐛 MemGPT issue tracker Feb 8, 2024
@cpacker cpacker moved this from Ready to In progress in 🐛 MemGPT issue tracker Feb 8, 2024
@cpacker cpacker linked a pull request Feb 8, 2024 that will close this issue
@github-project-automation github-project-automation bot moved this from In progress to Done in 🐛 MemGPT issue tracker Feb 9, 2024
@cpacker
Copy link
Collaborator

cpacker commented Feb 9, 2024

Hi @dlaliberte , thank you so much for such a detailed bug report!

I just merged in a patch in #977 that fixes some of the bugs you mentioned. It doesn't change the % of messages that are send to the summarizer - I have a feeling that the other fixes will make it so that that's not required, however, please let me know if you think this still needs patching / if you still see this or a similar bug persisting.

@dlaliberte
Copy link
Author

Thanks for the quick fix! I look forward to trying it.

Regarding the % of messages sent to the summarizer, at the time I was questioning the 75% number, I hadn't yet figured out the underlying causes of the failure(s), and I suspected everything. But even though that turned out to not be relevant, it still got me thinking about what the effect would be of replacing 75% of messages with a very brief summary. I also wondered if it had been set to such a high percentage as a way of possibly dealing with the memory context limit being exceeded. At least, the summarizing would not have to be done as frequently as it would with a much smaller percentage.

One more thing I hope you will consider. In the past, I experimented with creating a summary of the entire conversation at every step of a conversation, and this summary was then added to the beginning of each reply by the LLM. Although it effectively shortened the context window for new content, it extended the memory of earlier parts of the conversation. My hope for MemGPT is that we could instead store all those summaries in the archival memory, so they would not consume the limited context window space. Ideally, each summary should connect with earlier and later summaries, and more distantly related summaries and documents.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants