Add library of tasks for "intelligence tests" to quickly assess model capabilities #92

MischaPanch · 2024-07-01T11:43:12Z

The goal is to have an automatic evaluation of model capabilities in the design space without having the developer look at the output.

We can develop the library of (binary) tests in tandem with e.g. #89.

Some things we might test:
Pixel-perfection of refactoring comes to mind, but other things are possible. Could be executed on demand with pytest (marking as skipped based on a condition), or in a dedicated script

opcode81 changed the title ~~Add "intelligence-tests" to quickly assess model capabilities~~ Add "intelligence tests" to quickly assess model capabilities Jul 1, 2024

opcode81 added the backlog label Jul 1, 2024

opcode81 changed the title ~~Add "intelligence tests" to quickly assess model capabilities~~ Add library of tasks for "intelligence tests" to quickly assess model capabilities Jul 1, 2024

opcode81 added planned and removed backlog labels Jul 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add library of tasks for "intelligence tests" to quickly assess model capabilities #92

Add library of tasks for "intelligence tests" to quickly assess model capabilities #92

MischaPanch commented Jul 1, 2024 •

edited by opcode81

Loading

Add library of tasks for "intelligence tests" to quickly assess model capabilities #92

Add library of tasks for "intelligence tests" to quickly assess model capabilities #92

Comments

MischaPanch commented Jul 1, 2024 • edited by opcode81 Loading

MischaPanch commented Jul 1, 2024 •

edited by opcode81

Loading