-
-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optional profiling #1137
Comments
As you've identified, this is a change that would have to be made in |
Hi, @adrian-lison. Is there a performance hit in Stan runs that don't include profiling statements, and if so, how much? If it's beyond a couple percent, then I'd be in favor of only conditionally including anything to do with profiling. |
@bob-carpenter profiling is done with an object that uses the constructor and destructor to registers some callbacks on the autodiff stack for timing, so it should be 0 cost if you don't use it (minus the cost of creating an empty @adrian-lison do you have an example model where profiling itself is a non trivial overhead compared to the evaluation of the density itself? It seems like it should be very cheap, glancing at the code, but I'm happy to be proven wrong and look into this feature request more. |
Hey @WardBrian, thanks for your reply. I would agree that in many cases, the overhead is probably negligible compared to the density evaluation. It should only become an issue when lots of profiling callbacks are made. A very contrived example would be the following linear regression model.
With this model, I get an approx. 5-6x runtime increase per chain when profiling (tested for 1000, 2500 and 5000 data points). Obviously, this model implementation is very stupid. The question is if there are many realistic scenarios in which people would be forced to use a lengthy loop and want to profile a specific subsection of that loop. This could then be a relevant case where the profiling overhead starts to matter. One potential example from my domain is infectious disease modeling, where we often require loops to model infection and ascertainment processes, e.g. here, and if the individual computations within the loop become more complicated, it might be interesting to profile them (we haven't specifically assessed profiling overhead there however). I also wonder how profiling within map(-reduce) function calls would play out. |
I see two potential solutions to this:
The first seems more in line with how we currently do things (e.g. $ make clean
$ make examples/bernoulli/bernoulli
# builds
$ make STAN_THREADS=true examples/bernoulli/bernoulli
# triggers a re-build
$ make examples/bernoulli/bernoulli
make: 'examples/bernoulli/bernoulli' is up to date. A compiler option would skip the object creation entirely, and in theory avoids this re-compilation issue (which can actually be even more subtle than the above if |
Profiling is a great tool for debugging and performance optimization, but it also adds a bit of overhead to the execution of the Stan program. So it would be nice to skip the profiling statements whenever you don't need them and want the model to run as fast as possible - without the need to manually remove them from the stan program or to always keep two version of the same program (one with and one without profiling statements).
Ideal solution
Ideally, there would be an additonal
profile = FALSE/TRUE
argument to thesample
function of a compiled stan model that would be passed on toCmdStan
.Workaround
As a workaround, one can automatically create a temporary copy of the stan program with all profiling statements removed, compile this temporary copy, and sample from this model without profiling. This requires no changes to
CmdStan
. The disadvantage is of course that this requires a second compiling of the stan model.I have implemented this workaround as a wrapper of
cmdstan_model
in this repo. We have been using it during the development work for epinowcast and it has been quite robust. It also supports included files. It would also be possible to use this workaround for aprofile = FALSE/TRUE
argument in thesample
function.Context
Regarding the ideal solution of integrating this functionality with
CmdStan
, I lack the expertise to judge how difficult that would be and probably could not do it myself. If people think a workaround like above is acceptable, I'm happy to help implementing it forcmdstanr
, but would need some guidance (and a few details still need to be sorted out).The text was updated successfully, but these errors were encountered: