Concurrency turning out useless on codebase & machine #2525

xmo-odoo · 2018-09-27T18:39:39Z

This is on a codebase with 260kLOC across ~3600 files (python-only, according to tokei), on a 2010 MBP (2 cores 2 HT) running OSX 10.11, under Python 3.6.6 from macports

Using -j with a number of jobs different from 1 significantly increases CPU consumption (~90%/core), but yields no improvement in wallclock time:

> pylint -j1 *
pylint -v -j1 * 1144.10s user 44.51s system 96% cpu 20:36.81 total
> pylint -j2 *
pylint -j2 * 2386.66s user 117.09s system 184% cpu 22:37.15 total
> pylint -j4 *
pylint -j4 * 3897.49s user 161.62s system 340% cpu 19:50.96 total
> pylint -j0 *
pylint -j * 3850.79s user 155.45s system 341% cpu 19:31.81 total

Not sure what other informations to provide.

The text was updated successfully, but these errors were encountered:

PCManticore · 2018-09-28T12:12:25Z

Wow, that's incredible, thanks for reporting an issue. I wonder if the overhead of pickling the results back to the workers is too big at this point, we'll have to switch our approach for the parallel runner if that's the case.

belm0 · 2018-11-29T02:18:09Z

I've observing this too (OS X, i7 processor). --jobs merely multiplies the CPU time, with negligible effect on wall time.

$ pylint --version
pylint 2.1.1

$ time pylint my_package/
real	1m27.865s
user	1m25.645s
sys	0m1.996s

$ time pylint --jobs 4 my_package/
real	1m17.986s
user	4m14.076s
sys	0m12.917s

Tenzer · 2020-02-17T13:10:45Z

I found Pylint got a lot faster by removing concurrency (jobs=1) compared to trying to make it as concurrent as it should (jobs=0). The execution time across a number of different project code sizes sped up by 2.5-3 times.

A concrete project has 134k LOC across 1662 Python files and a Pylint run across all the files dropped from 3m 33s to 1m 30s on average on a MBP dual core (with HT). CPU utilisation also dropped to less than half according to CPU time.

I wonder if there's any cases where running Pylint concurrently is helping, or if it would be better to disable the feature for now?

owillebo · 2020-09-27T12:08:12Z

Some results on Windows 10 2004, Intel 9850H 6 cores/12 threads 32 bit pylinting matplotlib.
Interesting is that with roughly half the threads we get fastest result.
Results in seconds wall clock duration.

pylint --version

pylint 2.6.0
astroid 2.4.2
Python 3.7.9 (tags/v3.7.9:13c94747c7, Aug 17 2020, 18:01:55) [MSC v.1900 32 bit (Intel)]

cloc matplotlib

 360 text files.
 352 unique files.
 154 files ignored.

github.com/AlDanial/cloc v 1.86  T=1.09 s (226.7 files/s, 144433.3 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                         221          25052          39919          85792

Running pylint with the default configuration;

pylint -j12 matplotlib  1>NUL
71.7007919
pylint -j11 matplotlib  1>NUL
71.2478186
pylint -j10 matplotlib  1>NUL
69.9589435
pylint -j9 matplotlib  1>NUL
69.8973282
pylint -j8 matplotlib  1>NUL
66.9836301
pylint -j7 matplotlib  1>NUL
67.7956229
pylint -j6 matplotlib  1>NUL
65.0402625
pylint -j5 matplotlib  1>NUL
67.2663403
pylint -j4 matplotlib  1>NUL
73.0464569
pylint -j3 matplotlib  1>NUL
88.9432869
pylint -j2 matplotlib  1>NUL
120.4550162
pylint -j1 matplotlib  1>NUL
238.5004808

Pierre-Sassoulas · 2020-09-27T15:11:09Z

@owillebo thank for the data, I think intuitively it makes sense that the optimal is 6 threads on a 6 core machine. Apparently this bug is not affecting you.

owillebo · 2020-09-27T16:30:35Z

Thanks, I think utilizing all (12) available threads and halving the time for running Pylint is a good thing. Burning threads is a waste of time and resources. I think this bug is affecting more than myself (which is indeed of less importance).

…

On Sun, Sep 27, 2020, 17:11 Pierre Sassoulas ***@***.***> wrote: @owillebo <https://github.com/owillebo> thank for the data, I think intuitively it makes sense that the optimal is 6 threads on a 6 core machine. Apparently this bug is not affecting you. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2525 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADXQ4KTGMDEX5PHGESTNQYDSH5IZTANCNFSM4FXU6BDQ> .

xmo-odoo · 2020-09-28T08:09:53Z

@owillebo thank for the data, I think intuitively it makes sense that the optimal is 6 threads on a 6 core machine. Apparently this bug is not affecting you.

Indeed, hyperthreading can lead to better use of the underlying hardware, but if there are no significant stalls (or both hyperthreads are stalled in similar ways) and all the threads are competing for the same underlying units the hyperthreads are just going to sequentially use the same resources.

And the can is so conditional that, given the security issues of their implementation, Intel is actually moving away from HT: the 9th gen only uses HT at the very high end (i9) and very low (Celeron) ends, none of the 9th gen i3, i5 and i7 supports hyperthreading.

owillebo · 2020-09-28T08:47:23Z

If I run two pylint sessions concurrently each with 6 jobs and another half of the matplotlib files, the wall clock duration drops from 65 seconds (for all files in 1 session) down to 60 seconds. Indeed my threads don't bring much.

…

On Mon, 28 Sep 2020 at 10:10, xmo-odoo ***@***.***> wrote: @owillebo <https://github.com/owillebo> thank for the data, I think intuitively it makes sense that the optimal is 6 threads on a 6 core machine. Apparently this bug is not affecting you. Indeed, hyperthreading *can* lead to better use of the underlying hardware, but if there are no significant stalls (or both hyperthreads are stalled in similar ways) and all the threads are competing for the same underlying units the hyperthreads are just going to sequentially use the same resources. And the *can* is so conditional that, given the security issues of their implementation, Intel is actually moving away from HT: the 9th gen only uses HT at the very high end (i9) and very low (Celeron) ends, none of the 9th gen i3, i5 and i7 supports hyperthreading. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2525 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADXQ4KW3NK6KZODBTEDKV43SIBAGDANCNFSM4FXU6BDQ> .

DanielNoord · 2022-07-18T15:05:21Z

#6978 (comment) is quite an interesting result.

xmo-odoo · 2022-07-19T06:41:35Z

#6978 (comment) is quite an interesting result.

That is true but I think it's a different issue: in the original post the CPU% does grow pretty linearly with the number of workers, which indicates that the core issue isn't a startup stall (#6978 shows clear CPU usage dips).

xmo-odoo · 2022-07-19T08:36:48Z

Also FWIW I've re-run pylint on the original project, though only on a subset (as I think the old run was on 1.x, and pylint has slowed a fair bit in the meantime, plus the project has grown).

This is on an 4 cores 8 threads Linux machine (not macOS this time), Python 3.8.12, 2.14.5.

The subsection I linted is 71kLOC in 400 files. The results are as follow:

-j0 pylint -j$i * > /dev/null  206.82s user 1.05s system 99% cpu 3:27.90 total
-j1 pylint -j$i * > /dev/null  205.74s user 1.08s system 99% cpu 3:26.85 total
-j2 pylint -j$i * > /dev/null  163.57s user 1.59s system 199% cpu 1:22.77 total
-j3 pylint -j$i * > /dev/null  198.93s user 2.15s system 298% cpu 1:07.29 total
-j4 pylint -j$i * > /dev/null  238.08s user 2.52s system 384% cpu 1:02.55 total
-j5 pylint -j$i * > /dev/null  304.31s user 3.00s system 450% cpu 1:08.26 total
-j6 pylint -j$i * > /dev/null  374.35s user 3.96s system 551% cpu 1:08.61 total
-j7 pylint -j$i * > /dev/null  462.39s user 4.68s system 639% cpu 1:13.04 total
-j8 pylint -j$i * > /dev/null  487.39s user 5.20s system 688% cpu 1:11.56 total

pylint does seem to scale to -j2, there's even a minor gain at -j3 (though far from 50%), beyond that it again spins its wheels and burns CPU with no improvement (the opposite really)
I thought j0 would be equivalent to j8 but apparently it's j1?
I'm not entirely sure why j1 costs so much more than j2 (almost 3x the wallclock time, and 20% higher USER), but it is repeatedly reproducible, I ran each 5 times in a row, and they exhibited those behaviors and wallclocks (roughly) very reliably, in fact 200-ish USER is on the lower end of j1 (it goes as high as 300), while 160-ish USER is about par for j2.

olivierlefloch · 2022-10-08T00:08:24Z

On M1 Macs, on large codebases, -j=0 is equivalent to -j=10, and seems (unsurprisingly, given high performance vs efficiency cores) to perform worse than -j=6 ; this makes it difficult to specify a single value in the shared pylintrc config file for a repository shared between developers using a broad variety of machines, and likely makes -j=0 undesirable on recent Apple machines.

xmo-odoo · 2022-10-10T05:58:54Z

@olivierlefloch I don't think that can be solved by pylint (or any other auto-worker program): a while back I tried to see if the stdlib had a way to know real cores (as opposed to vcores / hyper threads) due to the comment preceding yours and didn't find one. I don't remember seeing anything for efficiency/performance either.

I think the best solution would be to run under a bespoke hardware layout (make it so pylint can only see an enumerated set of cores of your choice), but I don't know if macos supports this locally (I don't remember something similar to linux's taskset). There is a program called CPUSetter which allows disabling cores globally, but...

Also it doesn't seem like e.g. multiprocessing.cpu_count() is aware of CPU affinity, however pylint already uses os.sched_getaffinity so it should work properly on linux.

Seems since we move to new builder setup (weaker machines, 4 executors) CI jobs were stuck on pre-commit pylint stage taking 100% and timing out the 15min we gave for precommit stage seem like usin `-j 1` remove stop of the load on the CPU and even finishes faster Ref: pylint-dev/pylint#2525

PCManticore added the performance label Sep 28, 2018

doublethefish mentioned this issue Mar 27, 2020

Fix #3314 duplicate code error only shows up with pylint jobs 1 #3458

Closed

4 tasks

Pierre-Sassoulas pinned this issue Apr 19, 2020

doublethefish mentioned this issue Apr 21, 2020

check_parallel| Adds new test suite for multiple workers/check_parallel #3474

Closed

4 tasks

PCManticore unpinned this issue Apr 21, 2020

Pierre-Sassoulas added the multiprocessing label Mar 2, 2021

DanielNoord mentioned this issue Jun 17, 2022

Spawning child-process when using multiprocessing is very slow #6967

Closed

Pierre-Sassoulas mentioned this issue Jun 18, 2022

Pylint jobs wait a lot start #6978

Closed

Pierre-Sassoulas added the Needs PR This issue is accepted, sufficiently specified and now needs an implementation label Jul 2, 2022

This was referenced Jul 31, 2022

"jobs = 0" is twice slower than "jobs = 1" when running via "pre-commit" #7245

Closed

Checker plugin parallelization breaks algorithm #7263

Open

Pierre-Sassoulas mentioned this issue Oct 28, 2022

Pylint does not report "unused-import" if there are quotes around type hint #7679

Closed

Pierre-Sassoulas mentioned this issue Nov 6, 2022

Add an option to generate progress output #1554

Open

Pierre-Sassoulas mentioned this issue Nov 22, 2022

fail/warn on using parallel execution with custom plugins #3232

Closed

fruch mentioned this issue Feb 9, 2023

feature(pre-commit): replace pylint with ruff scylladb/scylla-cluster-tests#5799

Merged

jacobtylerwalls mentioned this issue May 13, 2023

Load custom plugins when linting in parallel #8683

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concurrency turning out useless on codebase & machine #2525

Concurrency turning out useless on codebase & machine #2525

xmo-odoo commented Sep 27, 2018 •

edited

Loading

PCManticore commented Sep 28, 2018

belm0 commented Nov 29, 2018

Tenzer commented Feb 17, 2020

owillebo commented Sep 27, 2020

Pierre-Sassoulas commented Sep 27, 2020

owillebo commented Sep 27, 2020 via email

xmo-odoo commented Sep 28, 2020

owillebo commented Sep 28, 2020 via email

DanielNoord commented Jul 18, 2022

xmo-odoo commented Jul 19, 2022

xmo-odoo commented Jul 19, 2022

olivierlefloch commented Oct 8, 2022

xmo-odoo commented Oct 10, 2022

Concurrency turning out useless on codebase & machine #2525

Concurrency turning out useless on codebase & machine #2525

Comments

xmo-odoo commented Sep 27, 2018 • edited Loading

PCManticore commented Sep 28, 2018

belm0 commented Nov 29, 2018

Tenzer commented Feb 17, 2020

owillebo commented Sep 27, 2020

Pierre-Sassoulas commented Sep 27, 2020

owillebo commented Sep 27, 2020 via email

xmo-odoo commented Sep 28, 2020

owillebo commented Sep 28, 2020 via email

DanielNoord commented Jul 18, 2022

xmo-odoo commented Jul 19, 2022

xmo-odoo commented Jul 19, 2022

olivierlefloch commented Oct 8, 2022

xmo-odoo commented Oct 10, 2022

xmo-odoo commented Sep 27, 2018 •

edited

Loading