-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
build: GNU Make jobserver to prevent resource glut (make_job_server rebased on master) #155
Conversation
@xqms I've rebased this on master, it appears to work, and feel free to copy this snapshot to your fork. |
@davetcoleman when you're feeling bored, give this branch a shot and see if you still run out of resources |
Rebase of #85 |
I just pulled that branch and am testing now. |
@xqms I modified the implementation so that it's possible to disable the jobserver with the
|
@davetcoleman Nice to hear it helped you as well! @jbohren okay, sounds good, though I think @wjwwood preferred an opt-in (like |
498df12
to
fa9dc71
Compare
@xqms also is there a difference between |
Something that might make sense is to add the jobserver options to the persistent config. This patch just augments the Alternatively, it'd be better to have a high-level This would display like:
|
@jbohren If the jobserver is enabled, it may make sense to replace Yes, I agree this thing should be integrated with the workspace config. |
Can you show the memory trace using catkin_make also? |
Thanks for everyone helping out on this one. I'll try to review it and try it out myself by the end of the week. |
+1 That is what most of us naively expected |
You mean the memory history plot? I'll try to record it next time I rebuild |
Yeah. On Thu, Mar 12, 2015, 16:58 Dave Coleman [email protected] wrote:
|
This still needs a bit of work to do what people "expect" it to do. Each job calls |
e11d3d2
to
551c94d
Compare
@xqms @davetcoleman @wjwwood This should be good to review. I re-worked it to make sure the jobserver singleton gets initialized in one place, but this could still be cleaned up. The behavior of this PR is as follows:
|
75bb565
to
b35694d
Compare
:rtype: dict | ||
""" | ||
|
||
regex = r'(?:^|\s)(?:-?(j|l)(\s*[0-9]+|\s|$))' + \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you probably wanted to use JOBS_FLAGS_REGEX from above here, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah I meant to remove that in favor of just having two different patterns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment still unaddressed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's outdated. The referenced variable isn't in the PR any more.
@jbohren I had a look at the code, seems good to me. I'll test this in my setup today and let you know if anything breaks. |
b35694d
to
73e0f9b
Compare
# make sure we're observing load maximums | ||
if self.max_load is not None: | ||
try: | ||
max_load = 8.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is probably left over by accident?
ffc9fcd
to
c0ae6e0
Compare
I have just discovered an issue: Multiple invocations of
|
Okay, to clarify: the |
@xqms Yeah, also now that
|
@wjwwood Just updated. There was an issue previously where the make arg parsing wasn't properly extracting jobs args formatted like @xqms This removes the jobserver args from "additional make args". I was considering adding the jobserver max jobs / max load / max mem to the context, but we'd also have to add those options to the saved configs, which just involves a lot of additional changes. As it stands, I think that this is a pretty critical patch and if there are any other outstanding features, we should bump them to another PR. It would be nice to add a strong test harness around this thing, but I'm not sure what the best approach would be for that. Mostly you'd want to test:
|
I've been using, and keeping up with the updates, of this branch everyday. Seems to work for me. |
e881f0e
to
5892168
Compare
You've not added the dependency on
Also the function you're using from https://code.google.com/p/psutil/wiki/Documentation#Memory Please at least update the pull request to use I'm not happy about adding a dependency on Having said all of that, I'm really not convinced the memory limits is going to be a reliable way to manage how much memory the whole compile is using. Unlike CPU usage, the memory usage of each
You'll hand out all six job tokens (because for the first few seconds the memory usage is really low) and each of them will grow to use 2 GB and run your system out of memory anyways. However, it's possible that it won't work out like that and the memory limit will help, but I wouldn't call it reliable. This is a tough problem and we worked with the Tango project to work around this problem in a few places. The most reliable way to fix this kind of "perfect storm of parallel make jobs" was to instrument the build so we could figure out which process, when run at the same time, busted the system. Then they would manually adjust the inter-target dependencies to prevent those object files from being built at the same time. @tfoote was the instrumenting code ever pushed somewhere public? Based on that, I would say we should either remove the memory limit option or make it "experimental" and make the dependency on |
Of course it's not going to be reliable. But if some crazy person (@davetcoleman cough cough) is running around with a computer with no swap, he can probably set it to something like 65% and be reasonably confident it's not going to brick his machine.
"Perfect is the enemy of the good" -- some guy that got something done I'm not concerned enough about this problem to do any more about it than is in the experimental memory limit check.
It already is an unlisted "experimental" option. I'm happy putting a try-except around the psutil import and then throwing an error if someone tries using it without support. That being said, anyone with
I'm going to remove the dependency description completely (except for the travis config) because it's going to be an optional dependency.
Yeah I know it's deprecated. Unfortunately the |
I would appreciate making it optional. It's not going to be a problem for ROS users, but I put a lot of effort into minimizing the dependencies, so I'd like to keep it that way if possible.
I remember this now, it was quite a problem for If you can get the optional import in place tonight or tomorrow morning I can merge this and get it in the next release. Tomorrow is our ROS bug fix party, so I plan to spend some time on this tracker and making a release. Thanks for working on it and iterating with me. |
@wjwwood re: collecting build parameters of specific invocations. The tools is available at https://github.com/osrf/watchprocess It can generate a report of cpu and memory usage for each process using an instrumented program during a build. |
Thanks for the crazy person shout out @jbohren I've taken your advice: |
@wjwwood I added the lazy import of psutil and the version checking. I've tested it with version 0.4.1 but no other versions. |
5702aec
to
970d270
Compare
This makes it possible to enforce a global limit on the number of processes through the --jobserver-limit argument. build: jobserver: provide context manager interface ... and use it in executor.py build: jobserver: silent support test build: provide job information on status line nicely as suggested by NikolausDemmel.
- you can disable the job server and use it with other job servers via catkin_tools context config - the jobserver can be parameterized by the normal -j argument as well as -j stored in MAKEFLAGS - the jobserver can regulate jobs based on the current system load - the jobserver can regulate jobs based on available RAM
- simplifying enabling / supporting - adding parsing of `-l` / `--load-average` args - adding hidden experimental option for limiting based on memory
- Simplifying context summary to not show jobserver args as "additional make args" - Fixing failure to parse '-j N' where there is a space separating the '-j' and 'N'
I merged this with some adjustments. |
Also, a rebase which is why it doesn't show up as "merged". |
This is a rebase and extension of @xqms's jobserver prototype in PR #85 which was created to fix #84. Without the jobserver,
catkin build
will use too many resources on machines with multiple cores.Behavior using this PR:
catkin build --no-jobserver
-- current behavior (maxes out all jobs for all packages unless using another jobserver like distcc)catkin build
-- build using the jobserver and as many jobs as CPUscatkin build --jobserver
-- explicit version of the line abovecatkin build -jN
-- build using the jobserver andN
jobsMAKEFLAGS="-jN" catkin build
-- build using the jobserver andN
jobscatkin config --no-jobserver
-- disable the jobserver for future buildscatkin build -lV
-- don't create more than one job at a time if the system load is greater thanV
catkin build --mem-limit P
-- don't create more than one job at a time if more thanP
% of the system memory is usedOutstanding issues:
?/0 Jobs
when the jobserver is disabled, it should display something friendlierinternal_make_jobserver
switchesuse_internal_make_jobserver
configure_make_args
psutil
an optional dependency, and only load it if the memory control options are usedFuture work:
-jN
) and the number of packages (-pM
) are still decoupled. The jobserver will limit the number ofMake
instances, but maybe this is ok?