-
Notifications
You must be signed in to change notification settings - Fork 616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Container memory stats include filesystem cache usage #280
Comments
+1 |
@mikeybtn , thank you for reporting this issue. You are correct in pointing out that the ECS Agent does not subtract the memory cache value from usage while reporting memory stats. We also need to take in to account if subtracting the cache value is the proper way of handling this or if we should expose the a different metric to deal with the ‘Cache’ memory stat (as was the case here where docker-archive/libcontainer#518 fixed this behavior in docker-archive/libcontainer#506). We will get back to you when we have an update for this issue. Thanks, |
@aaithal thanks for acknowledging! I can see how this might extend to a wider product issue, where you need to surface both in AWS charts and metrics. I'd be happy to help however I can. |
Thanks for identifying this issue. We also recently deployed a legacy Java app into ECS, and I was concerned seeing that the service is reporting memory utilization at a level almost twice our JVM -Xmx setting. My own investigation also agrees that ECS is reporting memory cache as part of overall memory utilization. My only concern at this point is that we have a container improperly stopped by ECS due to memory usage exceeding the container definition, when the memory usage is only due to file cache and not our application. |
And now that I've done a little more reading to better understand what is really going on in Docker regarding memory usage, reporting, and limits, I now realize my question above is invalid since the ECS agent is not responsible for killing the container on OOM, but rather the kernel cgroup. |
+1 I have a docker service that produces a lot of temporary files. ECS is killing it periodically even though our JVM metrics show the JVM usage is stable and well below the memory limit. |
@marklieberman, thank you for your feedback. Please note that the memory cgroup limit is a parameter set by the docker daemon (eventually enforced as memory cgroup limit by the kernel). This is also documented in the ECS documentation about task definitions. |
Just going to +1 this issue. Important for us so that we can size-to-fit our containers appropriately. Specifically, this prevents us from running a low-memory process with low memory limits because we do not know the difference between process memory or cache (we could build something that does this for us, but we'll just make due with higher memory limits for now). |
Hi All, I wanted to provide an update on this wrt changes proposed as per #582. Merging I ran a bash command to consume as much cache as possible by simply reading all Now, if the container is relaunched with a memory limit of To summarize, merging this PR would result in a drop in the I have also verified this behavior across both generations of AMIs that support Please let me know if you have any concerns regarding the same. Test 1: Run container with
|
+1 |
If I understand correctly, even after #582 filesystem cache is still included when considering whether to OOM kill a process in a container? Just wanted to understand what was doing on and what the expected behavior actually is. |
@ibrahima The file cache should not affect OOM behavior as the kernel should be able to evict from the file cache to make more memory available to the processes within the container. |
Woah, thanks for the fast response! I'm a bit confused because the behavior I'm seeing from refreshing top while my main process is running seems to imply that somehow it is getting killed while there is plenty of memory being used for cache when the process gets killed, but it may be due to something else. I should add that we're currently on a very old AMI, planning to upgrade soon but just haven't had the chance. |
On ecs, we noticed the
MemoryUtilization
graph for one service steadily growing, and began looking for a memory leak in that service.On the vm, the graphed value was consistent with
docker stats <container>
. However, attaching to the container showed the RSS/VSIZE of all running process was stable over time, and less than the reported value. The investigation led us to moby/moby#10824 and its eventual resolution in docker-archive/libcontainer#518. In short, the usage figure seems to include the page cache.It looks like aws-ecs-agent does not subtract the cache value when building/reporting container stats.
Would it make sense to report usage as
CgroupStats.MemoryStats.Usage - CgroupStats.MemoryStats.Cache
, to avoid confusing evictable memory with "real" usage? Or is there a technique we're neglecting that could avoid this situation altogether? Thanks!The text was updated successfully, but these errors were encountered: