Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add library of tasks for "intelligence tests" to quickly assess model capabilities #92

Open
MischaPanch opened this issue Jul 1, 2024 · 0 comments
Labels

Comments

@MischaPanch
Copy link
Collaborator

MischaPanch commented Jul 1, 2024

The goal is to have an automatic evaluation of model capabilities in the design space without having the developer look at the output.

We can develop the library of (binary) tests in tandem with e.g. #89.

Some things we might test:
Pixel-perfection of refactoring comes to mind, but other things are possible. Could be executed on demand with pytest (marking as skipped based on a condition), or in a dedicated script

@opcode81 opcode81 changed the title Add "intelligence-tests" to quickly assess model capabilities Add "intelligence tests" to quickly assess model capabilities Jul 1, 2024
@opcode81 opcode81 changed the title Add "intelligence tests" to quickly assess model capabilities Add library of tasks for "intelligence tests" to quickly assess model capabilities Jul 1, 2024
@opcode81 opcode81 added planned and removed backlog labels Jul 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants