Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add timeout parameter. #621

Merged
merged 8 commits into from
Sep 21, 2022
Merged

Add timeout parameter. #621

merged 8 commits into from
Sep 21, 2022

Conversation

tillahoffmann
Copy link
Contributor

@tillahoffmann tillahoffmann commented Sep 21, 2022

Submission Checklist

  • Run unit tests (on macOS; will have to wait for Windows and Linux in CI)
  • Declare copyright holder and open-source license: see below

Summary

This PR adds a timeout parameter to optimize, sample, generate_quantities, and variational including unit tests. I was running hyperparameter sweeps for a model and some configurations took much longer than others. Setting a timeout means that "bad" configurations can be aborted if they take too long.

I used a threading.Timer to terminate the cmdstan subprocess. Checking the elapsed time in _run_cmdstan didn't work because the readline call blocks until the next line of stdout becomes available (which can take a while if the model is sampling inefficiently).

Copyright and Licensing

Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company):

Harvard University

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:

Copy link
Member

@WardBrian WardBrian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice idea, thanks for looking into it.

Do you have the black formatter installed? Some of the changes look like they need to be formatted

test/test_variational.py Outdated Show resolved Hide resolved
cmdstanpy/model.py Outdated Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented Sep 21, 2022

Codecov Report

Merging #621 (a71a36d) into develop (1c761ff) will increase coverage by 0.15%.
The diff coverage is n/a.

@@             Coverage Diff             @@
##           develop     #621      +/-   ##
===========================================
+ Coverage    80.16%   80.31%   +0.15%     
===========================================
  Files           69       69              
  Lines        10338    10386      +48     
===========================================
+ Hits          8287     8342      +55     
+ Misses        2051     2044       -7     
Impacted Files Coverage Δ
cmdstanpy/cmdstanpy/stanfit/runset.py 97.33% <0.00%> (+0.11%) ⬆️
...rk/cmdstanpy/cmdstanpy/cmdstanpy/stanfit/runset.py 97.33% <0.00%> (+0.11%) ⬆️
a/cmdstanpy/cmdstanpy/cmdstanpy/stanfit/runset.py 96.00% <0.00%> (+0.16%) ⬆️
cmdstanpy/cmdstanpy/model.py 88.66% <0.00%> (+0.22%) ⬆️
...runner/work/cmdstanpy/cmdstanpy/cmdstanpy/model.py 88.66% <0.00%> (+0.22%) ⬆️
a/cmdstanpy/cmdstanpy/cmdstanpy/model.py 90.45% <0.00%> (+0.80%) ⬆️
cmdstanpy/cmdstanpy/utils/command.py 85.41% <0.00%> (+4.16%) ⬆️
...ork/cmdstanpy/cmdstanpy/cmdstanpy/utils/command.py 85.41% <0.00%> (+4.16%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@tillahoffmann
Copy link
Contributor Author

I just raised RuntimeError for the time being. Thoughts on raising TimeoutError instead so it's easier to catch the exception?

@WardBrian
Copy link
Member

That would certainly be clearer, but I assume it has some difficulties with our check_returncode/get_err_msg functionality?

@tillahoffmann
Copy link
Contributor Author

I'm currently manually patching the return code to 60 as os.strerror(60) == 'Operation timed out' if the timer terminates the subprocess. So we could raise a TimeoutError based on the return code?

@WardBrian
Copy link
Member

Sounds good to me

@WardBrian
Copy link
Member

I suppose the other thing you could do besides patch the return code is have the Timer function both terminate the proc and indicate it did this by setting an attribute in runset. I'm mostly indifferent, especially if we think that -15 as a return code would never show up normally

@tillahoffmann
Copy link
Contributor Author

tillahoffmann commented Sep 21, 2022

I've tried it with storing state in runset which might also fix the failing Windows tests (not sure whether Windows also has -15 as a response to a process being terminated).

@tillahoffmann
Copy link
Contributor Author

Alright, these seem to be passing now.

As a timing-related thing, what are your thoughts on adding a time.sleep(1e-3) in the while proc.poll() is None loop here? That would reduce the CPU overhead from the busy waiting a bit. The subprocess.Popen.wait method uses a sleep of 1 ms, for example.

@WardBrian
Copy link
Member

Seems reasonable, but looking at that code it seems like it will do the wrong thing if more than one line of output is buffered at a time (which is bad)? So we would also definitely want to do readlines instead of readline, and process any potential output in a loop, I think.

Copy link
Member

@WardBrian WardBrian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Do you want to look at the proc.poll issue in a different PR? If so I will merge this

@tillahoffmann
Copy link
Contributor Author

Makes sense; maybe I'll have a look at that in a separate PR.

@WardBrian WardBrian merged commit c2bab85 into stan-dev:develop Sep 21, 2022
@tillahoffmann tillahoffmann deleted the timeout branch September 21, 2022 19:59
@tillahoffmann
Copy link
Contributor Author

As a follow up, the issue of busy-waiting is not an issue: readline blocks when there's no content so the loop is only busy when there's actually stuff to do.

@WardBrian WardBrian mentioned this pull request Sep 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants