Address sporadic hanging of evals on certain samples #1482

thesofakillers · 2024-03-13T09:41:32Z

As has been brought up before (#1384, #1292, #270), evals suffer from a hanging issue, where an evaluation run will hang for a very long time (if not indefinitely) at the end of a run (say, on the 99th sample of out 100).

This PR addresses this issue, by replacing a seemingly redundant single-threaded thread creation that was happening when making requests, nested inside the already multi-threaded eval loop. My impression is that this nested multithreading was causing overhead that resulted in the hanging experienced.

I had also noticed this hanging issue in EVALS_SEQUENTIAL=1 mode (where it no longer occurs at the end, but instead randomly in the middle of the run).

I was able to identify the source of this issue though debugging print statements that ultimately pointed to the request_with_timeout function as the culprit.

We have tested the new request_with_timeout code on a fork where we have run multiple new and pre-existing evals, including with 3rd party solvers, and found no change in behaviour or errors, and a clear improvement on the hanging issue.

etr2460

Thanks for the fix!

We're now implementing solvers for new APIs we're calling (Anthropic, Gemini, ...). Each solver was implementing the same logic for backing off and retrying when the API query limit was hit. This PR created a generic create_retrying function, which retries when specific exceptions are raised. These exceptions are passed as arguments. This uses the changes from #1482

Adds a solver for Gemini 1.5 Pro. Stacked on #1501 and #1482. Using the solver requires the `GEMINI_API_KEY` environment variable Test with: ``` oaieval generation/direct/gemini-pro bugged_tools ``` --------- Co-authored-by: Chan Jun Shern <[email protected]>

avoid nested threading

4c261d9

thesofakillers requested review from andrew-openai, etr2460 and katyhshi as code owners March 13, 2024 09:41

This was referenced Mar 21, 2024

Unified create_retrying for all solvers #1501

Merged

Add Gemini Solver #1503

Merged

etr2460 approved these changes Mar 25, 2024

View reviewed changes

etr2460 merged commit bfe3925 into openai:main Mar 25, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Address sporadic hanging of evals on certain samples #1482

Address sporadic hanging of evals on certain samples #1482

thesofakillers commented Mar 13, 2024 •

edited

Loading

etr2460 left a comment

Address sporadic hanging of evals on certain samples #1482

Address sporadic hanging of evals on certain samples #1482

Conversation

thesofakillers commented Mar 13, 2024 • edited Loading

etr2460 left a comment

Choose a reason for hiding this comment

thesofakillers commented Mar 13, 2024 •

edited

Loading