Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llms: add caching functionality for Models #564

Merged
merged 1 commit into from
Mar 20, 2024
Merged

Conversation

corani
Copy link
Contributor

@corani corani commented Jan 26, 2024

New caching functionality has been added to langchaingo to boost
execution speed for repetitive tasks. Specifically, we implemented
an in-memory caching for Language Learning Models (LLMs) so that once
a model generates a content from a sequence of messages, this result
is stored in the cache. Any subsequent calls with the same sequence of
messages will read the result from the cache, instead of having the
LLM regenerate it, this optimization is intended to reduce the
response time for recurring requests.

This caching functionality is generic, different cache backends can be
used when creating the wrapper. To accomplish this, the 'llms/cache'
package was created with a generic wrapper that adds caching to a
'llms.Model'. The 'llms/cache/inmemory' package was also created for
the in-memory implementation of the cache.

Additionally, a caching example was included to demonstrate the usage
of the implemented caching mechanism.

Minor fix: Typo error in 'TotalTokesn' was corrected to 'TotalToken'
in 'ollama/ollamallm.go'.

Resolves #395

PR Checklist

  • Read the Contributing documentation.
  • Read the Code of conduct documentation.
  • Name your Pull Request title clearly, concisely, and prefixed with the name of the primarily affected package you changed according to Good commit messages (such as memory: add interfaces for X, Y or util: add whizzbang helpers).
  • Check that there isn't already a PR that solves the problem the same way to avoid creating a duplicate.
  • Provide a description in this PR that addresses what the PR is solving, or reference the issue that it solves (e.g. Fixes #123).
  • Describes the source of new concepts.
  • References existing implementations as appropriate.
  • Contains test coverage for new functions.
  • Passes all golangci-lint checks.

@corani
Copy link
Contributor Author

corani commented Jan 26, 2024

First draft, comments are welcome!

I've included an example to demonstrate the usage, output:

Iteration #0

The first man to walk on the moon was Neil Armstrong. He stepped foot onto the lunar surface on July 20, 1969, during the Apollo 11 mission. Armstrong famously declared, "That's one small step for man, one giant leap for mankind," as he became the first person to walk on the moon.

(took 4.36054737s)
========================
Iteration #1

The first man to walk on the moon was Neil Armstrong. He stepped foot onto the lunar surface on July 20, 1969, during the Apollo 11 mission. Armstrong famously declared, "That's one small step for man, one giant leap for mankind," as he became the first person to walk on the moon.

(took 53.469µs)
========================
Iteration #2

The first man to walk on the moon was Neil Armstrong. He stepped foot onto the lunar surface on July 20, 1969, during the Apollo 11 mission. Armstrong famously declared, "That's one small step for man, one giant leap for mankind," as he became the first person to walk on the moon.

(took 3.977µs)
========================

@corani
Copy link
Contributor Author

corani commented Jan 26, 2024

Todo:

  • Patch streaming function
  • Documentation

@corani corani force-pushed the corani/llm-cacher branch 5 times, most recently from f662d1d to 8fb1d28 Compare January 29, 2024 04:12
@corani corani marked this pull request as ready for review January 29, 2024 04:14
@corani
Copy link
Contributor Author

corani commented Jan 29, 2024

@tmc this is ready for review

@corani
Copy link
Contributor Author

corani commented Feb 4, 2024

Ping @tmc 😉

@tmc
Copy link
Owner

tmc commented Feb 4, 2024

Looking.

@corani
Copy link
Contributor Author

corani commented Feb 22, 2024

Did this drop off the radar?

New caching functionality has been added to langchaingo to boost
execution speed for repetitive tasks.  Specifically, we implemented
an in-memory caching for Language Learning Models (LLMs) so that once
a model generates a content from a sequence of messages, this result
is stored in the cache. Any subsequent calls with the same sequence of
messages will read the result from the cache, instead of having the
LLM regenerate it, this optimization is intended to reduce the
response time for recurring requests.

This caching functionality is generic, different cache backends can be
used when creating the wrapper. To accomplish this, the 'llms/cache'
package was created with a generic wrapper that adds caching to a
'llms.Model'. The 'llms/cache/inmemory' package was also created for
the in-memory implementation of the cache.

Additionally, a caching example was included to demonstrate the usage
of the implemented caching mechanism.

Minor fix: Typo error in 'TotalTokesn' was corrected to 'TotalToken'
in 'ollama/ollamallm.go'.

Resolves tmc#395
@corani
Copy link
Contributor Author

corani commented Mar 20, 2024

Rebased on latest main. @tmc are you still interested in taking this?

Copy link
Owner

@tmc tmc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lovely! Thanks for your contribution.

@tmc tmc merged commit 5635461 into tmc:main Mar 20, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support LLM Caching
2 participants