-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
【please help】Unmanaged memory only increases but does not decrease #6556
Comments
Perhaps the GC is not releasing that memory back to Windows. What are your GC settings, is ServerGC on? |
@ReubenBond Yes, turn on ServerGC |
Turn off the ServerGC,maybe solve the issue. |
It can be solved by shutting down ServerGC. Does Orleans plan memory space in advance for performance? |
Orleans does not perform any special unmanaged memory allocations in order to reserve space. There are some buffer pools, but those are managed and grow dynamically. |
@pipermatt What's the deployment environment here - Windows/Linux, .NET Core/Full Framework, which version? I don't see anything in the fixes between 3.1.0 and 3.1.6 that could obviously change memory allocation profile. Did you upgrade anything else at the same time by chance? Have you tried taking and analyzing memory dumps to see what the memory is used by? |
Linux, .NET Core 3.1... On average memory utilization seems to increase at a rate of about 15MB/hr... and there's just SO much allocated it's difficult to wade through it all via command line in Linux. I'm not an expert in the profiling tools, that's for sure. I seem to be able to reproduce the behavior locally on my MBP as well... but dotnet-dump doesn't support Mac OS X. 😏 So I've been ssh'ing into a test Linux instance to try to diagnose. Tomorrow I may grab a Windows machine so I have the full benefit of PerfView, dotTrace, etc... but first, since I can reproduce locally, I'm methodically stripping down our configuration to as barebones as possible one feature at a time. We did upgrade several other libraries that are called by our grain code, but the memory leak is apparent on an idle silo that isn't taking any traffic and doesn't have any of our grains instantiated yet. We'll get it figured out... and will report back. 👍 |
After stripping my silo of features until it was about as basic as possible, I came to the conclusion that what I was seeing locally was a red herring and not indicative of the problem I saw in production. On a whim, I rolled forward to the release that deployed just before the available memory tanked... You can see a dip where the deploy happened for each silo node, but it is humming along just fine. So now without a real reproduction case, I'm going to have to shelve this investigation unless it rears it's head again. |
Interesting. Did you also upgrade to 3.1.7? |
If memory does indeed leak over time, need to look at the memory dumps or GC profiles. @ReubenBond might have suggestion how to do that in an non-invasive manner. |
Yeah, I'm working that angle now, though not on the production servers (yet). I think I'm seeing the exact same behavior with this build in my TEST environment, so I'm working on memory dumps there. |
Update: there was another difference discovered. ;) The version that appeared to be leaking memory had its LogLevel set to Debug... we had been running at LogLevel.Information previously. We weren't actually seeing a memory leak... we were seeing Linux allocate more and more memory to disk caching to buffer the writes to the system journal. This memory was always reclaimed when the Silo needed it, though this process itself was slow enough that we would see a spike of errors while it happened. The tidbit that we didn't understand was why on redeploy, not ALL the memory that had been used was freed. Now, it makes perfect sense... because the silo process wasn't the one using it at all... Linux itself was. Eventually the OS decreased the cache allocation when we rolled back to the version that had LogLevel.Info and it no longer needed so much memory caching to keep up with the journal writes. Mystery solved! |
Thank you for the update, @pipermatt! Makes perfect sense. This reminds me again how often misconfigured logging may cause non-obvious issues. @lfzm Have you resolved your problem? Can we close this issue now? |
@pipermatt Haha,Mystery has not been solved. |
@HermesNew I believe this is most likely a ServerGC (.NET Core) concern, rather than something specific to Orleans. It might be worth looking at the various GC settings in the documentation here: https://docs.microsoft.com/en-us/dotnet/core/run-time-config/garbage-collector#systemgcretainvmcomplus_gcretainvm, in particular, RetainVM might be of interest |
@ReubenBond Start a simple Orleans, use JetBrains dotMemory to detect that there will be Unmanaged memory. So I suspect Orleans’ problem |
Not necessarily. The GC deals with unmanaged memory, Orleans does not |
@ReubenBond |
I don't recommend it. I recommend keeping server GC enabled if you are running in production. Are you running in a Linux container? You can set a limit on the maximum amount of memory used if you want. Note that ServerGC uses one heap per core by default, but you can reduce that using another setting. |
With .NET Core 3.0, the runtime should just respect the cgroup limits. |
Yep, by default it will allow up to 75% of the cgroup memory limit. CPU limits also play a part in determining the number of heaps. In this case, I think it's probably running on windows, but I'm not sure. |
@ReubenBond Is in production,running on windows server 2012R2. BTW: Orleans version:v3.1.7,.NET Core : v3.1 |
@ReubenBond I am now preparing to migrate to the linux container.So I want to know the best settings. Based on current practice, this setting is optimal. |
Is that unmanaged memory causing the application to terminate? Does it grow forever, or just for a few hours? I would imagine that things hit a steady state rather quickly? |
The greater the load, the greater the memory consumption, and the memory will not decrease.It will causing the application to terminate.It will cause OOM Exception. |
Are you saying that you are seeing OOM exceptions? |
When the application terminate will throw oom exception.I have analyzed the dump file,mainly |
According to @lfzm methods, this problem can reappear. |
Can you share the crash dump? |
I'm here It is seen that it seems that because
Have you heard that there seems to be a problem with CPU recognition errors? |
The dump file is large. |
@Cloud33 that advice no longer applies. The GC recognises CPU limits present in the container and adjusts heap count accordingly. Additionally, you can set the memory limit (and it's also detected from the container's cgroup). @HermesNew You can set a memory limit if you want. If you do, do you still see OOM exceptions? How long does the application run for before crashing with an OOM? |
@ReubenBond Ok |
EDIT oops, my bad it wasn't the processes I thought that were eating the memory... I should learn how to read |
We are marking this issue as stale due to the lack of activity in the past six months. If there is no further activity within two weeks, this issue will be closed. You can always create a new issue based on the guidelines provided in our pinned announcement. |
This issue has been marked stale for the past 30 and is being closed due to lack of activity. |
Orleans version:v3.1.7
.NET Core : v3.1
Can provide DotMemory snapshot。
The text was updated successfully, but these errors were encountered: