Advanced Evals: Combining Multiple Annotators of Varying Quality #1379

nelsonauner · 2024-08-15T00:16:07Z

Summary

This summary adds a notebook for dealing with multiple annotators, partially replicating the seminal MT-Bench paper while using OpenAI's new structured outputs and token probability features unavailable in other LLMs

Motivation

There currently is no cookbook on dealing with combining a large number of human annotators with LLM-As-Judge, even though this practice is quite common.

For new content

When contributing new content, read through our contribution guidelines, and mark the following action items as completed:

I have added a new entry in registry.yaml (and, optionally, in authors.yaml) so that my content renders on the cookbook website.
I have conducted a self-review of my content based on the contribution guidelines:
- Relevance: This content is related to building with OpenAI technologies and is useful to others.
- Uniqueness: I have searched for related examples in the OpenAI Cookbook, and verified that my content offers new insights or unique information compared to existing documentation.
- Spelling and Grammar: I have checked for spelling or grammatical mistakes.
- Clarity: I have done a final read-through and verified that my submission is well-organized and easy to understand.
- Correctness: The information I include is correct and all of my code executes successfully.
- Completeness: I have explained everything fully, including all necessary references and citations.

We will rate each of these areas on a scale from 1 to 4, and will only accept contributions that score 3 or higher on all areas. Refer to our contribution guidelines for more details.

…ctured Outputs, Token Probabilities and CROWDLAB algorithm

nelsonauner · 2024-09-10T14:29:13Z

@pap-openai Anything I can do to help get this reviewed (or close it out?)

pap-openai · 2024-09-17T16:57:45Z

@pap-openai Anything I can do to help get this reviewed (or close it out?)

Hi @nelsonauner, we're currently working on a contributor license agreement and will come back a bit later once we've figured that out

github-actions · 2024-11-17T02:08:42Z

This PR is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 10 days.

Advanced Evals: Combining LLM-As-Judge, Multiple Annotators with Stru…

813bf8c

…ctured Outputs, Token Probabilities and CROWDLAB algorithm

nelsonauner marked this pull request as ready for review August 15, 2024 02:56

nelsonauner mentioned this pull request Aug 16, 2024

Fix clearly broken link in Cookbook (addresses #1370) #1381

Merged

QWolfp3 mentioned this pull request Aug 25, 2024

[FEATURE] #1392

Open

github-actions bot added the Stale label Nov 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Advanced Evals: Combining Multiple Annotators of Varying Quality #1379

Advanced Evals: Combining Multiple Annotators of Varying Quality #1379

nelsonauner commented Aug 15, 2024

nelsonauner commented Sep 10, 2024

pap-openai commented Sep 17, 2024

github-actions bot commented Nov 17, 2024

Advanced Evals: Combining Multiple Annotators of Varying Quality #1379

Are you sure you want to change the base?

Advanced Evals: Combining Multiple Annotators of Varying Quality #1379

Conversation

nelsonauner commented Aug 15, 2024

Summary

Motivation

For new content

nelsonauner commented Sep 10, 2024

pap-openai commented Sep 17, 2024

github-actions bot commented Nov 17, 2024