Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable STAN_THREADS by default #1179

Open
WardBrian opened this issue Aug 10, 2023 · 9 comments
Open

Enable STAN_THREADS by default #1179

WardBrian opened this issue Aug 10, 2023 · 9 comments

Comments

@WardBrian
Copy link
Member

Summary:

Following #1176 and with the addition of Pathfinder, a lot of the primary usages of cmdstan models are able to use threading, but only if STAN_THREADS is True.

I believe this has some downsides for single threaded performance (though I have never seen hard numbers), but it may be worth discussing moving this from an opt-in to an opt-out

Current Version:

v2.32.2

@wds15
Copy link
Contributor

wds15 commented Aug 10, 2023

Last time we checked Linux and macOS did have a negligible slowdown ~2% while windows was at ~10%. This was the case for certain models which have to dereference a lot the thread local pointer to the ad tape. This is a few years back and newer compilers may have improved things. From my view we should have gone to always threading long ago as the advantages outweigh the downsides … from my view (Linux & macOS user)…

@rok-cesnovar
Copy link
Member

I would hope the Windows slowdown is much less nowadays, given that we no longer support Rtools 3.5 and the 4.9.3 compiler it used.

@WardBrian
Copy link
Member Author

What would be the least intrusive way to enable this in a way that still lets users disable it? Most of our makefile logic just checks if STAN_THREADS is defined, not that it is set to a "truthy" value

@WardBrian
Copy link
Member Author

One possible way I thought of is giving the opt-out a new name, and then we can do something like

ifndef STAN_SINGLETHREADED
STAN_THREADS = 1
endif

near the top of the CmdStan makefile

@wds15
Copy link
Contributor

wds15 commented Aug 18, 2023

If my memory serves me right, then ode models using the rk45 integrator are performance sensitive to this threading thing on windows (because the ad tape is used a lot in relative terms to what else is going on).

An opt out option is a good start….but all in threading would make things simpler going forward.

@andrjohns
Copy link
Contributor

@wds15 do you know where I can find the original testing that showed the 30% hit? Hopefully it wouldn't need too much updating to check against the current version

@wds15
Copy link
Contributor

wds15 commented Aug 30, 2023

Sorry…don’t know that any longer. I would use from the performance test repo the ode example. That uses the rk45 integrator which is sensitive to this as I recall.

@WardBrian
Copy link
Member Author

I noticed RStan appears to unconditionally enable threading on Windows: https://github.com/stan-dev/rstan/blob/develop/StanHeaders/src/Makevars.win#L3

@WardBrian
Copy link
Member Author

Here is a summary of what I know on Windows:

Naively enabling STAN_THREADS while building with mingw's GCC leads to about a 2.5x slowdown on the schools-4 model (which has been known for quite some time) even when running on a single thread. There are known issues with the winpthread library used my mingw: msys2/MINGW-packages#13259, which we suspect are the cause

Alternatives:

  • There is no slowdown observed when using WSL
  • There is little (~10%) to no slowdown observed when using clang-cl (supporting this requires significant work on our build system)
  • I'm having some difficulty using a GCC built to use mcfgthread instead of winpthread, but it does appear that this has a lesser, but still pretty significant, slowdown (the model gets ~50% slower, not 100%+)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants