Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address sporadic hanging of evals on certain samples #1482

Merged
merged 1 commit into from
Mar 25, 2024

Conversation

thesofakillers
Copy link
Contributor

@thesofakillers thesofakillers commented Mar 13, 2024

As has been brought up before (#1384, #1292, #270), evals suffer from a hanging issue, where an evaluation run will hang for a very long time (if not indefinitely) at the end of a run (say, on the 99th sample of out 100).

This PR addresses this issue, by replacing a seemingly redundant single-threaded thread creation that was happening when making requests, nested inside the already multi-threaded eval loop. My impression is that this nested multithreading was causing overhead that resulted in the hanging experienced.

I had also noticed this hanging issue in EVALS_SEQUENTIAL=1 mode (where it no longer occurs at the end, but instead randomly in the middle of the run).

I was able to identify the source of this issue though debugging print statements that ultimately pointed to the request_with_timeout function as the culprit.

We have tested the new request_with_timeout code on a fork where we have run multiple new and pre-existing evals, including with 3rd party solvers, and found no change in behaviour or errors, and a clear improvement on the hanging issue.

Copy link
Collaborator

@etr2460 etr2460 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

@etr2460 etr2460 merged commit bfe3925 into openai:main Mar 25, 2024
1 check passed
JunShern pushed a commit that referenced this pull request Mar 26, 2024
We're now implementing solvers for new APIs we're calling (Anthropic,
Gemini, ...). Each solver was implementing the same logic for backing
off and retrying when the API query limit was hit. This PR created a
generic create_retrying function, which retries when specific exceptions
are raised. These exceptions are passed as arguments.

This uses the changes from #1482
JunShern added a commit that referenced this pull request Mar 26, 2024
Adds a solver for Gemini 1.5 Pro. Stacked on #1501 and #1482. Using the
solver requires the `GEMINI_API_KEY` environment variable

Test with:
```
oaieval generation/direct/gemini-pro bugged_tools
```

---------

Co-authored-by: Chan Jun Shern <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants