Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial promptfoo commit #508

Merged
merged 2 commits into from
May 25, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions eval/promptfoo/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# LLM Evaluation with Promptfoo

We are using the [Promptfoo.dev](https://www.promptfoo.dev/) project for LLM model evaluation.

```
podman build -t promptfoo eval/promptfoo/build
```

Make sure you are running an LLM before starting the promptfoo container.

```
podman run -it -p 15500:15500 -v <LOCAL/PATH/TO/>/locallm/eval/promptfoo/evals/:/promptfoo/evals:ro promptfoo
```

Go to `http://0.0.0.0:15500/setup/` to set up your tests.
8 changes: 8 additions & 0 deletions eval/promptfoo/base/Containerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
FROM registry.access.redhat.com/ubi9/nodejs-20-minimal:1-47.1715773198
WORKDIR /promptfoo
RUN npm install promptfoo
ENV PROMPTFOO_DISABLE_TELEMETRY=1
RUN mkdir evals
ENV PROMPTFOO_CONFIG_DIR=/promptfoo/evals
COPY promptfooconfig.yaml /promptfoo
ENTRYPOINT [ "npx", "promptfoo@latest", "view", "--yes" ]
MichaelClifford marked this conversation as resolved.
Show resolved Hide resolved
31 changes: 31 additions & 0 deletions eval/promptfoo/base/promptfooconfig.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# This configuration compares LLM output of 2 prompts x 2 GPT models across 3 test cases.
# Learn more: https://promptfoo.dev/docs/configuration/guide
description: 'My first eval'

prompts:
- "Write a tweet about {{topic}}"
- "Write a very concise, funny tweet about {{topic}}"

providers:
- openai:gpt-3.5-turbo-0613
- openai:gpt-4

tests:
- vars:
topic: bananas

- vars:
topic: avocado toast
assert:
# For more information on assertions, see https://promptfoo.dev/docs/configuration/expected-outputs
- type: icontains
value: avocado
- type: javascript
value: 1 / (output.length + 1) # prefer shorter outputs

- vars:
topic: new york city
assert:
# For more information on model-graded evals, see https://promptfoo.dev/docs/configuration/expected-outputs/model-graded
- type: llm-rubric
value: ensure that the output is funny
1 change: 1 addition & 0 deletions eval/promptfoo/evals/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Directory to store evaluation runs locally