Prompt engineering is an extremely iterative process. Even when we manage to settle down on a prompt, it's so difficult to test it against test cases and other possible prompts to make sure we're giving the best instructions to the model.
Prompts Royale is an application that allows you to really easily create many prompt candidates, write your own ones, and make them battle until a clear winner emerges. It makes the process of iterating on a prompt feel much more easy and reliable.
You give a description of what you want to accomplish, give it a few scenarios and expected outputs, and then just sit back and watch them battle for #1.
prompts.royale.mp4
- Automatic prompt generation: Allows for the creation of prompt candidates from the user's description and test case scenarios. The user can also input their own.
- Automatic test cases generation: Enables automatically creating test cases from the description to get the juices flowing!
- Monte Carlo Matchmaking + ELO Rating: It uses the Monte Carlo method for matchmaking to ensure you get as much information with the least amount of battles, and ELO Rating to properly rank candidates based on their wins and who they win against
- Everything is customizable: The settings page allow you to tinker with every single parameter of the application
- Local & Secure: Everything is stored locally and requests are made from your browser to the LLMs API.
You can use it at promptsroyale.com, or you can clone the repository and run it locally.
Requirements:
- Node v16+
Clone the repository:
git clone [email protected]:meistrari/prompts-royale.git
Install the dependencies. We're using Bun for managing dependencies, it's awesome.
bun i
Run the server:
bun run dev
Here's an example of what a use case would look like:
Write a prompt that creates a headline for a website.
Scenario
Website of a car dealership
Expected Output
Find the car of your dreams at the best price
Scenario
Website of a SaaS for data analytics
Expected Output
Your data, your insights, your way
Scenario
Website for a zoo
Expected Output
Animals of the world, in one place
With the information of the task to be accomplished and the test cases, the user can then Generate prompt candidates, that take those into account and outputs N candidates that will be ranked by the system through a combat system.
The user can also write the prompt candidates themselves if they have a specific idea of what they want. If you have something specific in mind, writing your prompts is always the best option to fit what you intend.
Click to see an example of the generated prompts
Prompt A
Design a compelling headline for a distinct category of website. Understand the website's mission, its intended audience, and the solutions it provides. The headline should attract attention, be pertinent, and distill the website's core concept into a succinct statement. Leverage your linguistic proficiency, promotional tactics, and domain-specific knowledge to create an enticing headline."
Prompt B
Your directive is to formulate an attractive headline for a specific kind of website. Pay attention to the website's objectives, its targeted demographics, and the services it offers. The headline should be captivating, germane, and capable of summarizing the website's primary proposition in a brief line. Use your language comprehension, advertising strategies, and industry-specific insights to create a compelling headline.
Each prompt is represented as a normal distribution with an initial mean
We then repeat the following cycle:
We use a monte carlo sampler to make duels where each of the 2 selected prompts battle each other. The chances of being drawn for the duel are proportional to the chance that that prompt is the best, or in more mathematical terms, we use the weighted distribution of their probability density to define the likelihood that that curve has the highest true mean within the group.
In each duel, both prompts answer each of the test cases and a separate prompt evaluates which answer was the best. The ELO scores of the prompts are then updated according to the following formulas:
Where:
-
$LR$ is the learning rate. The lower the learning rate the faster the "learning". By decreasing the standard deviation, you make results more meaningful. This means that the system will converge faster to a winner, but it also means that it will be less likely to find the best result. We recommend$0.96$ . -
$K$ is the number of points each battle has at stake. -
$N$ is the number of test cases -
$\dfrac{K}{N}$ is the amount of points each round has at stake- If two prompts with the same
$\mu$ battle and one prompt wins all of the rounds it will gain$\dfrac{K}{2}$ points and the other will lose$\dfrac{K}{2}$ points.
- If two prompts with the same
-
$score$ is$1$ if$A$ wins and$0$ if$B$ wins, -
$expected\_score(A,B)$ is the expected chance of$A$ winning versus$B$ given their normal distributions. This is calculated by the following formula:$$expected\_score(A,B) = \frac{1}{1 + 10^{\frac{\mu_{B} - \mu_{A}}{400}}}$$
The reason why we update the
We were heavily inspired by https://github.com/mshumer/gpt-prompt-engineer, thanks for your work!