-
Notifications
You must be signed in to change notification settings - Fork 616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enable hierarchy memory limit, config cgroup root #1195
Conversation
200ad66
to
081449f
Compare
081449f
to
d46ad8d
Compare
d46ad8d
to
0299a3a
Compare
133102a
to
79bf8f4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may also want to update the header of all the touched file to 2018.
agent/config/config.go
Outdated
@@ -280,6 +280,8 @@ func environmentConfig() (Config, error) { | |||
|
|||
dockerStopTimeout := getDockerStopTimeout() | |||
|
|||
cgroupMemorySubsystemPath := os.Getenv("ECS_CGROUP_MEMORY_SUBSYSTEM_PATH") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be updated in the readme file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
agent/config/config_unix.go
Outdated
@@ -27,6 +27,8 @@ const ( | |||
defaultCredentialsAuditLogFile = "/log/audit.log" | |||
// Default cgroup prefix for ECS tasks | |||
DefaultTaskCgroupPrefix = "/ecs" | |||
|
|||
defaultCgroupMemorySubsystemPath = "/sys/fs/cgroup/memory/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment for this?
agent/resources/resources_linux.go
Outdated
// echo 1 > memory.use_hierarchy | ||
memorySubsystem := c.memorySubsystemPath | ||
|
||
err = c.ioutil.WriteFile(filepath.Join(memorySubsystem, cgroupRoot, memoryUseHierarchy), []byte(strconv.Itoa(1)), os.FileMode(0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make the 1
and 0
constant and add some comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless you feel strongly about it, I don't think this needs to have its own constants since they're only being used in this specific instance, also I think the comment echo 1 > ...
conveys context. Let me know though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we do this for each task? Isn't it enough if we did this once at the top level ecs
cgroup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aaithal you're right, ill move this up a level and make it part of cgroupInit()
@richardpen righttt the header dates. will do. |
@@ -105,7 +105,7 @@ func TestPerformUpdateWithUpdatesDisabled(t *testing.T) { | |||
Reason: ptr("Updates are disabled").(*string), | |||
}}) | |||
|
|||
u.performUpdateHandler(statemanager.NewNoopStateManager(), engine.NewTaskEngine(cfg, nil, nil, nil, nil, nil, nil))(&ecsacs.PerformUpdateMessage{ | |||
u.performUpdateHandler(statemanager.NewNoopStateManager(), engine.NewTaskEngine(cfg, nil, nil, nil, nil, nil, nil, nil))(&ecsacs.PerformUpdateMessage{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: this is an opportunity for you to split these long lines into multiple lines :)
agent/app/agent.go
Outdated
@@ -308,13 +311,13 @@ func (agent *ecsAgent) newTaskEngine(containerChangeEventStream *eventstream.Eve | |||
if !agent.cfg.Checkpoint { | |||
seelog.Info("Checkpointing not enabled; a new container instance will be created each time the agent is run") | |||
return engine.NewTaskEngine(agent.cfg, agent.dockerClient, | |||
credentialsManager, containerChangeEventStream, imageManager, state, agent.metadataManager), "", nil | |||
credentialsManager, containerChangeEventStream, imageManager, state, agent.metadataManager, agent.resource), "", nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: again, this is an opportunity for you to split these long lines into multiple lines :)
README.md
Outdated
@@ -176,6 +176,7 @@ additional details on each available environment variable. | |||
| `ECS_ENABLE_CONTAINER_METADATA` | `true` | When `true`, the agent will create a file describing the container's metadata and the file can be located and consumed by using the container enviornment variable `$ECS_CONTAINER_METADATA_FILE` | `false` | `false` | | |||
| `ECS_HOST_DATA_DIR` | `/var/lib/ecs` | The source directory on the host from which ECS_DATADIR is mounted. We use this to determine the source mount path for container metadata files in the case the ECS Agent is running as a container. We do not use this value in Windows because the ECS Agent is not running as container in Windows. | `/var/lib/ecs` | `Not used` | | |||
| `ECS_ENABLE_TASK_CPU_MEM_LIMIT` | `true` | Whether to enable task-level cpu and memory limits | `true` | `false` | | |||
| `ECS_CGROUP_MEMORY_SUBSYSTEM_PATH` | `/sys/fs/cgroup/memory/` | The root path of the cgroup memory subsystem | `/sys/fs/cgroup/memory/` | Not applicable | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this the path on the host or in the agent mount? We should make that more clear here as well.
agent/app/agent.go
Outdated
@@ -145,6 +145,9 @@ func newAgent( | |||
metadataManager = containermetadata.NewManager(dockerClient, cfg) | |||
} | |||
|
|||
resource := resources.New() | |||
resource.ApplyConfigDependencies(cfg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be happening only if cgroup feature is enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That logic is within resources, I believe.
The idea was that 'resources' could hold on to more things than just cgroups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i can put it after the agent.cfg.TaskCPUMemLimit.Enabled
check
agent/resources/resources_linux.go
Outdated
// echo 1 > memory.use_hierarchy | ||
memorySubsystem := c.memorySubsystemPath | ||
|
||
err = c.ioutil.WriteFile(filepath.Join(memorySubsystem, cgroupRoot, memoryUseHierarchy), []byte(strconv.Itoa(1)), os.FileMode(0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: can you split this into multiple lines?
agent/resources/resources_linux.go
Outdated
// echo 1 > memory.use_hierarchy | ||
memorySubsystem := c.memorySubsystemPath | ||
|
||
err = c.ioutil.WriteFile(filepath.Join(memorySubsystem, cgroupRoot, memoryUseHierarchy), []byte(strconv.Itoa(1)), os.FileMode(0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we do this for each task? Isn't it enough if we did this once at the top level ecs
cgroup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
README.md
Outdated
@@ -176,6 +176,7 @@ additional details on each available environment variable. | |||
| `ECS_ENABLE_CONTAINER_METADATA` | `true` | When `true`, the agent will create a file describing the container's metadata and the file can be located and consumed by using the container enviornment variable `$ECS_CONTAINER_METADATA_FILE` | `false` | `false` | | |||
| `ECS_HOST_DATA_DIR` | `/var/lib/ecs` | The source directory on the host from which ECS_DATADIR is mounted. We use this to determine the source mount path for container metadata files in the case the ECS Agent is running as a container. We do not use this value in Windows because the ECS Agent is not running as container in Windows. | `/var/lib/ecs` | `Not used` | | |||
| `ECS_ENABLE_TASK_CPU_MEM_LIMIT` | `true` | Whether to enable task-level cpu and memory limits | `true` | `false` | | |||
| `ECS_CGROUP_MEMORY_SUBSYSTEM_PATH` | `/sys/fs/cgroup/memory/` | The root path of the cgroup memory subsystem | `/sys/fs/cgroup/memory/` | Not applicable | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer ECS_CGROUP_PATH
and not include the memory subsystem at this level.
It makes it more open for reuse, and I think its important given that it is a configuration variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this sounds reasonable.
agent/app/agent.go
Outdated
@@ -145,6 +145,9 @@ func newAgent( | |||
metadataManager = containermetadata.NewManager(dockerClient, cfg) | |||
} | |||
|
|||
resource := resources.New() | |||
resource.ApplyConfigDependencies(cfg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That logic is within resources, I believe.
The idea was that 'resources' could hold on to more things than just cgroups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code lgtm. I have stylistic and lint comments.
agent/config/types.go
Outdated
@@ -194,4 +194,6 @@ type Config struct { | |||
// OverrideAWSLogsExecutionRole is config option used to enable awslogs | |||
// driver authentication over the task's execution role | |||
OverrideAWSLogsExecutionRole bool | |||
|
|||
CgroupPath string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a comment
agent/resources/interface.go
Outdated
@@ -26,4 +27,6 @@ type Resource interface { | |||
Setup(task *api.Task) error | |||
// Cleanup removes the resource | |||
Cleanup(task *api.Task) error | |||
|
|||
ApplyConfigDependencies(cfg *config.Config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a comment
@@ -159,7 +160,7 @@ func TestFullUpdateFlow(t *testing.T) { | |||
|
|||
require.Equal(t, "update-tar-data", writtenFile.String(), "incorrect data written") | |||
|
|||
u.performUpdateHandler(statemanager.NewNoopStateManager(), engine.NewTaskEngine(cfg, nil, nil, nil, nil, nil, nil))(&ecsacs.PerformUpdateMessage{ | |||
u.performUpdateHandler(statemanager.NewNoopStateManager(), engine.NewTaskEngine(cfg, nil, nil, nil, nil, nil, nil, nil))(&ecsacs.PerformUpdateMessage{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also split this and other long lines as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor changes. FFTI the formatting comment.
@@ -105,7 +105,8 @@ func TestPerformUpdateWithUpdatesDisabled(t *testing.T) { | |||
Reason: ptr("Updates are disabled").(*string), | |||
}}) | |||
|
|||
u.performUpdateHandler(statemanager.NewNoopStateManager(), engine.NewTaskEngine(cfg, nil, nil, nil, nil, nil, nil, nil))(&ecsacs.PerformUpdateMessage{ | |||
u.performUpdateHandler(statemanager.NewNoopStateManager(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are trying to do a little too much in one line, even with line breaks. Maybe assign some variables above?
agent/config/config_unix.go
Outdated
@@ -30,7 +30,7 @@ const ( | |||
|
|||
// Default cgroup memory system root path, this is the default used if the | |||
// path has not been configured through ECS_CGROUP_MEMORY_SUBSYSTEM_PATH |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change the comment
9523667
to
faa63ba
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code LGTM, some minor comments. Also please make sure the header of the modified file is updated to 2018.
func (c *cgroupWrapper) ApplyConfigDependencies(cfg *config.Config) { | ||
c.cgroupPath = cfg.CgroupPath | ||
} | ||
|
||
// cgroupInit is used to create the root '/ecs/ cgroup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update this comment?
@@ -98,6 +98,17 @@ func TestOOMContainer(t *testing.T) { | |||
assert.NoError(t, err) | |||
} | |||
|
|||
// TestOOMTask verifies that a task with a memory limit returns an error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add the agent.RequireVersion(">=1.16.0")
for this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
agent/config/types.go
Outdated
@@ -194,4 +194,7 @@ type Config struct { | |||
// OverrideAWSLogsExecutionRole is config option used to enable awslogs | |||
// driver authentication over the task's execution role | |||
OverrideAWSLogsExecutionRole bool | |||
|
|||
// The expected Cgroup root path in the agent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the format CgroupPath is xxx
.
@@ -39,13 +42,18 @@ func TestInitHappyPath(t *testing.T) { | |||
defer ctrl.Finish() | |||
|
|||
mockControl := mock_cgroup.NewMockControl(ctrl) | |||
mockIO := mock_ioutilwrapper.NewMockIOUtil(ctrl) | |||
cgroupPath := fmt.Sprintf("/sys/fs/cgroup/memory/ecs/memory.use_hierarchy") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need the fmt.Sprintf
here.
44bdd17
to
4e7b2ad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, i have some minor comments.
agent/config/types.go
Outdated
@@ -194,4 +194,7 @@ type Config struct { | |||
// OverrideAWSLogsExecutionRole is config option used to enable awslogs | |||
// driver authentication over the task's execution role | |||
OverrideAWSLogsExecutionRole bool | |||
|
|||
// The expected Cgroup path in the agent is /sys/fs/cgroup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lint: CgroupPath is ...
agent/resources/resources_linux.go
Outdated
// echo 1 > memory.use_hierarchy | ||
cgroupRoot := c.cgroupPath | ||
|
||
err = c.ioutil.WriteFile(filepath.Join(cgroupRoot, memorySubsystem, config.DefaultTaskCgroupPrefix, memoryUseHierarchy), []byte(strconv.Itoa(1)), os.FileMode(0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: please split this into multiple lines. Also, please add documentation for all those magic numbers:
[]byte(strconv.Itoa(1)
os.FileMode(0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops missed this one for splits, and will doc numbers.
4e7b2ad
to
cc2a352
Compare
dfe8ba9
to
5ed88ec
Compare
agent/app/agent_test.go
Outdated
@@ -170,6 +170,8 @@ func TestDoStartTaskLimitsFail(t *testing.T) { | |||
saveableOptionFactory.EXPECT().AddSaveable(gomock.Any(), gomock.Any()).AnyTimes(), | |||
stateManagerFactory.EXPECT().NewStateManager(gomock.Any(), gomock.Any(), gomock.Any(), | |||
gomock.Any(), gomock.Any()).Return(statemanager.NewNoopStateManager(), nil), | |||
|
|||
resource.EXPECT().ApplyConfigDependencies(gomock.Any()).AnyTimes(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we make this more strict? MinTimes()
if the exact number of times is hard to predict?
agent/engine/default.go
Outdated
return NewDockerTaskEngine(cfg, client, credentialsManager, containerChangeEventStream, imageManager, state, metadataManager) | ||
metadataManager containermetadata.Manager, | ||
resource resources.Resource) TaskEngine { | ||
return NewDockerTaskEngine(cfg, client, credentialsManager, containerChangeEventStream, imageManager, state, metadataManager, resource) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we can probably split this into multiple lines as well
agent/resources/resources_linux.go
Outdated
@@ -93,6 +113,12 @@ func (c *cgroupWrapper) setupCgroup(task *api.Task) error { | |||
return errors.Wrapf(err, "resource: setup cgroup: unable to create cgroup at %s for task: %s", cgroupRoot, task.Arn) | |||
} | |||
|
|||
// echo 1 > memory.use_hierarchy | |||
err = c.ioutil.WriteFile(filepath.Join(c.cgroupPath, memorySubsystem, cgroupRoot, memoryUseHierarchy), []byte(strconv.Itoa(1)), os.FileMode(0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this too, please split this into multiple lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do i keep missing this one. FIXED 😬
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just have that testing comment. Otherwise, lgtm
f653300
to
4d84c8f
Compare
agent/resources/resources_linux.go
Outdated
@@ -93,6 +113,13 @@ func (c *cgroupWrapper) setupCgroup(task *api.Task) error { | |||
return errors.Wrapf(err, "resource: setup cgroup: unable to create cgroup at %s for task: %s", cgroupRoot, task.Arn) | |||
} | |||
|
|||
// echo 1 > memory.use_hierarchy | |||
memoryHierarchyPath := filepath.Join(c.cgroupPath, memorySubsystem, cgroupRoot, memoryUseHierarchy) | |||
err = c.ioutil.WriteFile(memoryHierarchyPath, []byte(strconv.Itoa(1)), os.FileMode(0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are still magic numbers. you should be able to const both of these with descriptive names. sorry for missing this in the last round.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah it's fine. i made this change earlier and i think it got lost while trying to revert back to per task memory.use_hierarchy
4d84c8f
to
caa861b
Compare
agent/resources/resources_linux.go
Outdated
} | ||
|
||
// cgroupInit is used to create the root '/ecs/ cgroup and enable | ||
// memory.use_hierarchy at the '/ecs/' level |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you edit this comment to reflect the new changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch.
agent/resources/resources_linux.go
Outdated
@@ -93,6 +114,13 @@ func (c *cgroupWrapper) setupCgroup(task *api.Task) error { | |||
return errors.Wrapf(err, "resource: setup cgroup: unable to create cgroup at %s for task: %s", cgroupRoot, task.Arn) | |||
} | |||
|
|||
// echo 1 > memory.use_hierarchy | |||
memoryHierarchyPath := filepath.Join(c.cgroupPath, memorySubsystem, cgroupRoot, memoryUseHierarchy) | |||
err = c.ioutil.WriteFile(memoryHierarchyPath, []byte(strconv.Itoa(setMemoryHierarchy)), os.FileMode(rootReadOnly)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for not being more clear about this. I think you can do something like to make it more readable:
const (
rootReadOnlyPermissions = os.FileMode(400)
)
var (
enableMemoryHierarchy = []byte(strconv.Itoa(1))
)
7f1c65d
to
cd79407
Compare
since we "Emit a couple of events for the task before cleanup finishes." We should also add corresponding client.EXPECT().StopContainer()
cd79407
to
0c883fa
Compare
Summary
Includes the following changes:
memory.use_hierarchy
scoped to the task's cgroup root/sys/fs/cgroup/memory/
. this addresses [WIP] Fix an error case with task limits #1147 (review)DockerTaskEngine
to receiveagent.resource
instead of creating it's own instance.We enforce the
memory.use_hierarchy
flag to the task level and not the/ecs
root because the kernel does not allow modifications to anything under the/ecs/
tree if child cgroups have already been created. This means that we will not be able to toggle thememory.use_hierarchy
flag for existing ECS instances. Reports of this issue are also tracked here lxc/lxc#236 and linuxkit/linuxkit#373.Implementation details
ApplyConfigDependencies(cfg *config.Config)
totype Resource interface
memorySubsystemPath string
totype cgroupWrapper struct
NewTaskEngine
andNewDockerTaskEngine
Testing
make release
)go build -out amazon-ecs-agent.exe ./agent
)make test
) passgo test -timeout=25s ./agent/...
) passmake run-integ-tests
) pass.\scripts\run-integ-tests.ps1
) passmake run-functional-tests
) pass.\scripts\run-functional-tests.ps1
) passNew tests cover the changes:
Description for the changelog
Licensing
This contribution is under the terms of the Apache 2.0 License: yes