llms: add caching functionality for Models #564

corani · 2024-01-26T11:29:21Z

New caching functionality has been added to langchaingo to boost
execution speed for repetitive tasks. Specifically, we implemented
an in-memory caching for Language Learning Models (LLMs) so that once
a model generates a content from a sequence of messages, this result
is stored in the cache. Any subsequent calls with the same sequence of
messages will read the result from the cache, instead of having the
LLM regenerate it, this optimization is intended to reduce the
response time for recurring requests.

This caching functionality is generic, different cache backends can be
used when creating the wrapper. To accomplish this, the 'llms/cache'
package was created with a generic wrapper that adds caching to a
'llms.Model'. The 'llms/cache/inmemory' package was also created for
the in-memory implementation of the cache.

Additionally, a caching example was included to demonstrate the usage
of the implemented caching mechanism.

Minor fix: Typo error in 'TotalTokesn' was corrected to 'TotalToken'
in 'ollama/ollamallm.go'.

Resolves #395

PR Checklist

Read the Contributing documentation.
Read the Code of conduct documentation.
Name your Pull Request title clearly, concisely, and prefixed with the name of the primarily affected package you changed according to Good commit messages (such as memory: add interfaces for X, Y or util: add whizzbang helpers).
Check that there isn't already a PR that solves the problem the same way to avoid creating a duplicate.
Provide a description in this PR that addresses what the PR is solving, or reference the issue that it solves (e.g. Fixes #123).
Describes the source of new concepts.
References existing implementations as appropriate.
Contains test coverage for new functions.
Passes all golangci-lint checks.

corani · 2024-01-26T11:32:09Z

First draft, comments are welcome!

I've included an example to demonstrate the usage, output:

Iteration #0

The first man to walk on the moon was Neil Armstrong. He stepped foot onto the lunar surface on July 20, 1969, during the Apollo 11 mission. Armstrong famously declared, "That's one small step for man, one giant leap for mankind," as he became the first person to walk on the moon.

(took 4.36054737s)
========================
Iteration #1

The first man to walk on the moon was Neil Armstrong. He stepped foot onto the lunar surface on July 20, 1969, during the Apollo 11 mission. Armstrong famously declared, "That's one small step for man, one giant leap for mankind," as he became the first person to walk on the moon.

(took 53.469µs)
========================
Iteration #2

The first man to walk on the moon was Neil Armstrong. He stepped foot onto the lunar surface on July 20, 1969, during the Apollo 11 mission. Armstrong famously declared, "That's one small step for man, one giant leap for mankind," as he became the first person to walk on the moon.

(took 3.977µs)
========================

corani · 2024-01-26T12:52:28Z

Todo:

Patch streaming function
Documentation

corani · 2024-01-29T04:15:27Z

@tmc this is ready for review

corani · 2024-02-04T02:50:27Z

Ping @tmc 😉

tmc · 2024-02-04T02:55:07Z

Looking.

corani · 2024-02-22T10:49:02Z

Did this drop off the radar?

New caching functionality has been added to langchaingo to boost execution speed for repetitive tasks. Specifically, we implemented an in-memory caching for Language Learning Models (LLMs) so that once a model generates a content from a sequence of messages, this result is stored in the cache. Any subsequent calls with the same sequence of messages will read the result from the cache, instead of having the LLM regenerate it, this optimization is intended to reduce the response time for recurring requests. This caching functionality is generic, different cache backends can be used when creating the wrapper. To accomplish this, the 'llms/cache' package was created with a generic wrapper that adds caching to a 'llms.Model'. The 'llms/cache/inmemory' package was also created for the in-memory implementation of the cache. Additionally, a caching example was included to demonstrate the usage of the implemented caching mechanism. Minor fix: Typo error in 'TotalTokesn' was corrected to 'TotalToken' in 'ollama/ollamallm.go'. Resolves tmc#395

corani · 2024-03-20T03:02:37Z

Rebased on latest main. @tmc are you still interested in taking this?

tmc

Lovely! Thanks for your contribution.

corani force-pushed the corani/llm-cacher branch from d9c0c15 to df2bb0c Compare January 26, 2024 11:33

corani force-pushed the corani/llm-cacher branch 5 times, most recently from f662d1d to 8fb1d28 Compare January 29, 2024 04:12

corani marked this pull request as ready for review January 29, 2024 04:14

tmc force-pushed the main branch from bf89d0c to 22159ce Compare February 27, 2024 02:36

corani force-pushed the corani/llm-cacher branch from 8fb1d28 to adab41f Compare March 20, 2024 02:58

tmc approved these changes Mar 20, 2024

View reviewed changes

tmc merged commit 5635461 into tmc:main Mar 20, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llms: add caching functionality for Models #564

llms: add caching functionality for Models #564

corani commented Jan 26, 2024 •

edited

Loading

corani commented Jan 26, 2024

corani commented Jan 26, 2024 •

edited

Loading

corani commented Jan 29, 2024

corani commented Feb 4, 2024

tmc commented Feb 4, 2024

corani commented Feb 22, 2024

corani commented Mar 20, 2024

tmc left a comment

llms: add caching functionality for Models #564

llms: add caching functionality for Models #564

Conversation

corani commented Jan 26, 2024 • edited Loading

PR Checklist

corani commented Jan 26, 2024

corani commented Jan 26, 2024 • edited Loading

corani commented Jan 29, 2024

corani commented Feb 4, 2024

tmc commented Feb 4, 2024

corani commented Feb 22, 2024

corani commented Mar 20, 2024

tmc left a comment

Choose a reason for hiding this comment

corani commented Jan 26, 2024 •

edited

Loading

corani commented Jan 26, 2024 •

edited

Loading