You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The goal is to have an automatic evaluation of model capabilities in the design space without having the developer look at the output.
We can develop the library of (binary) tests in tandem with e.g. #89.
Some things we might test:
Pixel-perfection of refactoring comes to mind, but other things are possible. Could be executed on demand with pytest (marking as skipped based on a condition), or in a dedicated script
The text was updated successfully, but these errors were encountered:
opcode81
changed the title
Add "intelligence-tests" to quickly assess model capabilities
Add "intelligence tests" to quickly assess model capabilities
Jul 1, 2024
opcode81
changed the title
Add "intelligence tests" to quickly assess model capabilities
Add library of tasks for "intelligence tests" to quickly assess model capabilities
Jul 1, 2024
The goal is to have an automatic evaluation of model capabilities in the design space without having the developer look at the output.
We can develop the library of (binary) tests in tandem with e.g. #89.
Some things we might test:
Pixel-perfection of refactoring comes to mind, but other things are possible. Could be executed on demand with pytest (marking as skipped based on a condition), or in a dedicated script
The text was updated successfully, but these errors were encountered: