Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set default blas num threads to Sys.CPU_THREADS / 2 in absence of OPENBLAS_NUM_THREADS #45412

Merged
merged 5 commits into from
May 22, 2022

Conversation

Moelf
Copy link
Contributor

@Moelf Moelf commented May 21, 2022

@Moelf
Copy link
Contributor Author

Moelf commented May 21, 2022

Maybe the right thing is to let you set any number explicitly as JULIA_NUM_THREADS, but if you don't specify then cap at 8 by default like we did before?

is OPENBLAS_NUM_THREADS still the ENV variable we want to use moving forward?

@ViralBShah
Copy link
Member

ViralBShah commented May 21, 2022

This needs to not be tied to JULIA_NUM_THREADS, because even in single-threaded Julia, you may still want multi-threaded BLAS. We also don't want the cap to 8.

How about we set OPENBLAS_NUM_THREADS to Threads.nthreads() and if the number of julia threads is 1, then use (Sys.CPU_THREADS/2) for BLAS threads to match the physical CPU cores. Of course if the user has provided the openblas threading environment variable, we respect that.

@ViralBShah ViralBShah added linear algebra Linear algebra performance Must go faster backport 1.8 Change should be backported to release-1.8 labels May 21, 2022
@Moelf
Copy link
Contributor Author

Moelf commented May 21, 2022

yeah this is not tied to JULIA_NUM_THREADS. I'm just unsure what @JeffBezanson 's comment was saying, I think he was saying do something like what we do with JULIA_NUM_THREADS, not actually using it for BLAS number of threads

@ViralBShah
Copy link
Member

The problem with Julia threading defaults (-t auto) is that it also picks the hyperthreaded and not the physical cores.

@Moelf
Copy link
Contributor Author

Moelf commented May 21, 2022

yes. Well, this PR only sets OPENBLAS_NUM_THREADS to 8 when it's not set by the user.

I always wanted -t auto to set # of threads to # of physical cores but alas that's a separate issue.

@ViralBShah
Copy link
Member

Right, but that setting is almost always wrong - like on my laptop where I have only 2 physical and 4 hyperthreaded cores, it will pick 8. And on a big machine where we have more CPUs, it will pick too few. That's why my preference is just Julia's reported cores divided by 2.

Would be great to have @chriselrod and @staticfloat chime in.

@Moelf
Copy link
Contributor Author

Moelf commented May 21, 2022

reusing:

max(1, Sys.CPU_THREADS ÷ 2),

@ViralBShah
Copy link
Member

I think we should we do this directly through the BLAS.set_num_threads() API and that LinearAlgebra.__init__() is the right place to do it.

@Moelf
Copy link
Contributor Author

Moelf commented May 21, 2022

like this?

@ViralBShah ViralBShah changed the title cap default blas num threads to 8 again set default blas num threads to Sys.CPU_THREADS / 2 in absence of OPENBLAS_NUM_THREADS May 21, 2022
@chriselrod
Copy link
Contributor

chriselrod commented May 21, 2022

I think that's fine.

But with Apple M1 and Intel Alder Lake & future generations of both, we're going to have more and more cases where this is suboptimal.

We should use Julia's existing feature detection for this, but that's not something I've looked at yet.

@ViralBShah
Copy link
Member

There needs to be (or already exists) a separate issue for doing feature detection in Julia for all packages to use.

Copy link
Contributor

@chriselrod chriselrod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There needs to be (or already exists) a separate issue for doing feature detection in Julia for all packages to use.

Yeah, no reason not to do this until then.

@ViralBShah
Copy link
Member

I'm merging this so that it gives us time to experiment and see how it is working out.

@ViralBShah ViralBShah merged commit 390503e into JuliaLang:master May 22, 2022
@Moelf Moelf deleted the default_blas_numthreads branch May 22, 2022 14:47
KristofferC pushed a commit that referenced this pull request May 23, 2022
Set default blas num threads to Sys.CPU_THREADS / 2 in absence of OPENBLAS_NUM_THREADS

Co-authored-by: SamuraiAku <[email protected]>
(cherry picked from commit 390503e)
@KristofferC KristofferC mentioned this pull request May 23, 2022
67 tasks
pchintalapudi pushed a commit that referenced this pull request May 25, 2022
Set default blas num threads to Sys.CPU_THREADS / 2 in absence of OPENBLAS_NUM_THREADS 

Co-authored-by: SamuraiAku <[email protected]>
@KristofferC KristofferC removed the backport 1.8 Change should be backported to release-1.8 label May 26, 2022
@ViralBShah ViralBShah added the needs news A NEWS entry is required for this change label Jul 18, 2022
@JeffBezanson JeffBezanson removed the needs news A NEWS entry is required for this change label Nov 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
linear algebra Linear algebra performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

eigen is 20% slower on 1.8-beta and nightly than 1.7 release
6 participants