-
-
Notifications
You must be signed in to change notification settings - Fork 632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llms: add caching functionality for Models #564
Conversation
First draft, comments are welcome! I've included an example to demonstrate the usage, output:
|
d9c0c15
to
df2bb0c
Compare
Todo:
|
f662d1d
to
8fb1d28
Compare
@tmc this is ready for review |
Ping @tmc 😉 |
Looking. |
Did this drop off the radar? |
New caching functionality has been added to langchaingo to boost execution speed for repetitive tasks. Specifically, we implemented an in-memory caching for Language Learning Models (LLMs) so that once a model generates a content from a sequence of messages, this result is stored in the cache. Any subsequent calls with the same sequence of messages will read the result from the cache, instead of having the LLM regenerate it, this optimization is intended to reduce the response time for recurring requests. This caching functionality is generic, different cache backends can be used when creating the wrapper. To accomplish this, the 'llms/cache' package was created with a generic wrapper that adds caching to a 'llms.Model'. The 'llms/cache/inmemory' package was also created for the in-memory implementation of the cache. Additionally, a caching example was included to demonstrate the usage of the implemented caching mechanism. Minor fix: Typo error in 'TotalTokesn' was corrected to 'TotalToken' in 'ollama/ollamallm.go'. Resolves tmc#395
8fb1d28
to
adab41f
Compare
Rebased on latest main. @tmc are you still interested in taking this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lovely! Thanks for your contribution.
New caching functionality has been added to langchaingo to boost
execution speed for repetitive tasks. Specifically, we implemented
an in-memory caching for Language Learning Models (LLMs) so that once
a model generates a content from a sequence of messages, this result
is stored in the cache. Any subsequent calls with the same sequence of
messages will read the result from the cache, instead of having the
LLM regenerate it, this optimization is intended to reduce the
response time for recurring requests.
This caching functionality is generic, different cache backends can be
used when creating the wrapper. To accomplish this, the 'llms/cache'
package was created with a generic wrapper that adds caching to a
'llms.Model'. The 'llms/cache/inmemory' package was also created for
the in-memory implementation of the cache.
Additionally, a caching example was included to demonstrate the usage
of the implemented caching mechanism.
Minor fix: Typo error in 'TotalTokesn' was corrected to 'TotalToken'
in 'ollama/ollamallm.go'.
Resolves #395
PR Checklist
memory: add interfaces for X, Y
orutil: add whizzbang helpers
).Fixes #123
).golangci-lint
checks.