-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory context limit exceeded #957
Comments
I can create a PR with my fix, but my changes feel more like hacks, since I didn't figure out why the code might be the way it is, so there might be a better fix. |
Hi @dlaliberte , thank you so much for such a detailed bug report! I just merged in a patch in #977 that fixes some of the bugs you mentioned. It doesn't change the % of messages that are send to the summarizer - I have a feeling that the other fixes will make it so that that's not required, however, please let me know if you think this still needs patching / if you still see this or a similar bug persisting. |
Thanks for the quick fix! I look forward to trying it. Regarding the % of messages sent to the summarizer, at the time I was questioning the 75% number, I hadn't yet figured out the underlying causes of the failure(s), and I suspected everything. But even though that turned out to not be relevant, it still got me thinking about what the effect would be of replacing 75% of messages with a very brief summary. I also wondered if it had been set to such a high percentage as a way of possibly dealing with the memory context limit being exceeded. At least, the summarizing would not have to be done as frequently as it would with a much smaller percentage. One more thing I hope you will consider. In the past, I experimented with creating a summary of the entire conversation at every step of a conversation, and this summary was then added to the beginning of each reply by the LLM. Although it effectively shortened the context window for new content, it extended the memory of earlier parts of the conversation. My hope for MemGPT is that we could instead store all those summaries in the archival memory, so they would not consume the limited context window space. Ideally, each summary should connect with earlier and later summaries, and more distantly related summaries and documents. |
Describe the bug
Several people have experienced the error with memory context limit being exceeded, especially with local LLMs.
Please describe your setup
git clone
memgpt
? pwshScreenshots
Similar to this:
Additional context
It turns out there are about 3 bugs that need to be fixed to fully resolve this.
Here is my chat stream, copied from a thread in the discord Support channel.
The error about exceeding the maximum context length occurs in the process of trying to summarize the messages after it has been determined that an overflow will occur.
So, the summarizer needs to use a different way of using the LLM to do the summarizing that avoids this problem.
Why is it trying to summarize 75% (==0.75) of all messages, not counting the system message? That seems like way too severe, and a likely cause of sending a longer message than intended, just to get some text summarized.
Tracking this down further, in chat_completion_proxy.py, the get_chat_completion function is called with function_call parameter set to None, since functions is None. And this raises a ValueError.
I commented out that if check, since it doesn't seem to be important what the function_call was in that function. Then I got a summary from the LLM!
But then I immedately got another exception about exceeding the maximum context length. So ... I'll keep tracking this down. I suspect there are more of these stumbling blocks on the way back to using the summary.
The next exception is occurring in persistence_manager.py, trim_messages method:
The debug output indicates that len(self.messages) is 0!
But Agent._trim_messages is called when the agent.messages has a length of 27.
It looks like the persistence_manager was not given the messages from the agent.
Taking a wild guess about how to fix this problem with the persistence_manager, I am tempted to call the init method with the agent, which at least sets the self.messages to the agent.messages.
Well, that sorta worked! At least it avoided the exception in the persistence_manager trim_messages method.
But then the resulting summary maybe wasn't small enough, and the system got into an apparent infinite loop summarizing repeatedly, not making enough progress. I had reduced the amount to summarize from 75% down to just 25%, so maybe that is not enough, especially for the artificially small maximum length of only 4096. I'll try 50%.
Looking good! This may be fixed. I have not had any more memory context exceeded errors.
Local LLM details
If you are trying to run MemGPT with local LLMs, please provide the following information:
The text was updated successfully, but these errors were encountered: