-
-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggestion: Improved memory use #568
Comments
Yes, agreed, this code snippet should definitely not require so much memory. Will look into this soon. |
…first attempt to eliminate obvious sources of the memory leak.
@johannesulf - when I run your code snippet on my machine, I do not get what you get. I added the following line to code snippet you emailed to me:
Probably we should be using a more sophisticated tool like heapy or even just pympler to do this with greater reproducibility, but in any case I'm not sure what the source of the discrepancy is. I started a new branch on my private fork to begin to address this. First commit is here: aphearin@0afe3f7 If I can solve this within the next day or so, I'll include the fix in the v0.3 release that I'll push up to pip on Wednesday. Otherwise, implementing this fix into a public release will have to wait for v0.4 (which would be perfectly fine by me). |
Thanks for following up on this @johannesulf. I won't be able to address this in the coming release, but will revisit this after finishing v0.4. |
So it's really a problem with the garbage collection. When I call gc.collect() after each iteration the memory stays low all the time. But this gives some small overhead (~10%). |
Using pympler apparently doesn't work. When using it the increased memory use is gone. |
Ahh, very interesting observation about gc.collect(). A 10% overhead in CPU runtime seems like a very small sacrifice for a ~7x improvement in memory efficiency. What do you think, would it be worth it to include an optional keyword that allows users to run in "low-memory mode", or something like that? This way, users like you who would prefer to run MCMCs on their laptops would have the option to do so. I think that any such user would not be bothered by a 10% runtime overhead. Let me know if you have an opinion on this. |
Quick question: is garbage collection by python dependent upon the memory S On Tue, Jul 26, 2016 at 6:27 AM, Andrew Hearin [email protected]
|
Garbage collection is definitely machine-dependent, yes. It is not a coincidence that @johannesulf sees the equilibrium value of the sawtooth pattern oscillate at a value near the maximum ram of his laptop. But I don't think I quite follow your point. What is the downside of giving users the option to force python to clean up after itself? This reduces the memory footprint by ~5x in the case that @johannesulf. |
HI Andrew, The question would be where all would you put the garbage collection Cheers, On Tue, Jul 26, 2016 at 6:50 AM, Andrew Hearin [email protected]
|
@johannesulf - where did you put your call to gc.collect()? Just at the end of mock.populate()? |
My guess is that he put it in the main loop. Makes no sense to put it at from halotools.empirical_models import PrebuiltHodModelFactory halocat = CachedHaloCatalog(simname ='bolplanck', redshift=0) for i in range(1000): #**** Garbage collect Here *** Cheeers, Surhud On Tue, Jul 26, 2016 at 7:24 AM, Andrew Hearin [email protected]
|
It would be nice to limit the proliferation of keyword arguments. Maybe the best option is to simply add a |
I see, yes, good point, the namespace of mock.populate() would need to be exited in order for this to make sense. |
It actually is. For different model parameters it will eventually start to write into the swap and slow down the entire computation. So it seems like in this case Python doesn't know what's best. But given that it can be easily prevented by gc.collect(), I would close the issue and maybe add a note to mock.populate(). |
Ok then @johannesulf - in that case I think we agree that this is best handled with documentation rather than a new feature. Would you mind adding a few sentences to the part(s) of the code that typical users would most commonly go looking for information about reducing the memory footprint? If you think a dedicated doc page for this is warranted, then I can handle it, and we can just put hyperlinks to that page in a few conspicuous places. But if you think one or two docstring notes can do the trick, could you handle it? You can submit a PR if you're comfortable, or otherwise just email me with code modifications as you have before. |
The following code which continuously repopulates the same halo catalog with galaxies uses a lot of memory. Most importantly, the memory usage increases even after the first hundred iterations and only reaches a maximum after around 300 iterations. The amount of memory used after 300 iterations is roughly 3 times the memory used after one iteration. It's not obvious why halotools should at all increase in memory usage after each iteration.
Obviously, re-populating the same halo catalog over and over again is important for MCMC and there one commonly has a lot of memory, anyway. However, re-populating the same halo catalog is also important when plotting results of an MCMC chain. It might be desirable that this could be done on a computer with less than 16GB of RAM. It would be great if the memory usage of halotools could be reduced in such scenarios.
The text was updated successfully, but these errors were encountered: