Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable hierarchy memory limit, config cgroup root #1195

Merged
merged 11 commits into from
Jan 31, 2018

Conversation

adnxn
Copy link
Contributor

@adnxn adnxn commented Jan 19, 2018

Summary

Includes the following changes:

  1. enable memory.use_hierarchy scoped to the task's cgroup root
  2. make the cgroup root path configurable through ecs.config and resolved at runtime, let default be /sys/fs/cgroup/memory/. this addresses [WIP] Fix an error case with task limits #1147 (review)
  3. refactored DockerTaskEngine to receive agent.resource instead of creating it's own instance.

We enforce the memory.use_hierarchy flag to the task level and not the /ecs root because the kernel does not allow modifications to anything under the /ecs/ tree if child cgroups have already been created. This means that we will not be able to toggle the memory.use_hierarchy flag for existing ECS instances. Reports of this issue are also tracked here lxc/lxc#236 and linuxkit/linuxkit#373.

Implementation details

  1. echo 1 > memory.use_hierarchy scoped to task cgroup
  2. added ApplyConfigDependencies(cfg *config.Config) to type Resource interface
  3. added memorySubsystemPath string to type cgroupWrapper struct
  4. refactored function signatures for NewTaskEngine and NewDockerTaskEngine

Testing

  • Builds on Linux (make release)
  • Builds on Windows (go build -out amazon-ecs-agent.exe ./agent)
  • Unit tests on Linux (make test) pass
  • Unit tests on Windows (go test -timeout=25s ./agent/...) pass
  • Integration tests on Linux (make run-integ-tests) pass
  • Integration tests on Windows (.\scripts\run-integ-tests.ps1) pass
  • Functional tests on Linux (make run-functional-tests) pass
  • Functional tests on Windows (.\scripts\run-functional-tests.ps1) pass

New tests cover the changes:

Description for the changelog

Licensing

This contribution is under the terms of the Apache 2.0 License: yes

@adnxn adnxn changed the title [wip] config cgroup root [wip] config cgroup root, enable hierarchy memory limit Jan 19, 2018
@adnxn adnxn force-pushed the config-cgroup-root branch from 200ad66 to 081449f Compare January 19, 2018 18:51
@adnxn adnxn changed the title [wip] config cgroup root, enable hierarchy memory limit [wip] enable hierarchy memory limit, config cgroup root Jan 19, 2018
@adnxn adnxn force-pushed the config-cgroup-root branch from 081449f to d46ad8d Compare January 19, 2018 20:05
@adnxn adnxn force-pushed the config-cgroup-root branch from d46ad8d to 0299a3a Compare January 23, 2018 21:51
@adnxn adnxn force-pushed the config-cgroup-root branch 4 times, most recently from 133102a to 79bf8f4 Compare January 24, 2018 01:01
Copy link

@richardpen richardpen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may also want to update the header of all the touched file to 2018.

@@ -280,6 +280,8 @@ func environmentConfig() (Config, error) {

dockerStopTimeout := getDockerStopTimeout()

cgroupMemorySubsystemPath := os.Getenv("ECS_CGROUP_MEMORY_SUBSYSTEM_PATH")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be updated in the readme file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@@ -27,6 +27,8 @@ const (
defaultCredentialsAuditLogFile = "/log/audit.log"
// Default cgroup prefix for ECS tasks
DefaultTaskCgroupPrefix = "/ecs"

defaultCgroupMemorySubsystemPath = "/sys/fs/cgroup/memory/"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment for this?

// echo 1 > memory.use_hierarchy
memorySubsystem := c.memorySubsystemPath

err = c.ioutil.WriteFile(filepath.Join(memorySubsystem, cgroupRoot, memoryUseHierarchy), []byte(strconv.Itoa(1)), os.FileMode(0))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make the 1 and 0 constant and add some comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless you feel strongly about it, I don't think this needs to have its own constants since they're only being used in this specific instance, also I think the comment echo 1 > ... conveys context. Let me know though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we do this for each task? Isn't it enough if we did this once at the top level ecs cgroup?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aaithal you're right, ill move this up a level and make it part of cgroupInit()

@adnxn
Copy link
Contributor Author

adnxn commented Jan 24, 2018

@richardpen righttt the header dates. will do.

@adnxn adnxn changed the title [wip] enable hierarchy memory limit, config cgroup root enable hierarchy memory limit, config cgroup root Jan 24, 2018
@@ -105,7 +105,7 @@ func TestPerformUpdateWithUpdatesDisabled(t *testing.T) {
Reason: ptr("Updates are disabled").(*string),
}})

u.performUpdateHandler(statemanager.NewNoopStateManager(), engine.NewTaskEngine(cfg, nil, nil, nil, nil, nil, nil))(&ecsacs.PerformUpdateMessage{
u.performUpdateHandler(statemanager.NewNoopStateManager(), engine.NewTaskEngine(cfg, nil, nil, nil, nil, nil, nil, nil))(&ecsacs.PerformUpdateMessage{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: this is an opportunity for you to split these long lines into multiple lines :)

@@ -308,13 +311,13 @@ func (agent *ecsAgent) newTaskEngine(containerChangeEventStream *eventstream.Eve
if !agent.cfg.Checkpoint {
seelog.Info("Checkpointing not enabled; a new container instance will be created each time the agent is run")
return engine.NewTaskEngine(agent.cfg, agent.dockerClient,
credentialsManager, containerChangeEventStream, imageManager, state, agent.metadataManager), "", nil
credentialsManager, containerChangeEventStream, imageManager, state, agent.metadataManager, agent.resource), "", nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: again, this is an opportunity for you to split these long lines into multiple lines :)

README.md Outdated
@@ -176,6 +176,7 @@ additional details on each available environment variable.
| `ECS_ENABLE_CONTAINER_METADATA` | `true` | When `true`, the agent will create a file describing the container's metadata and the file can be located and consumed by using the container enviornment variable `$ECS_CONTAINER_METADATA_FILE` | `false` | `false` |
| `ECS_HOST_DATA_DIR` | `/var/lib/ecs` | The source directory on the host from which ECS_DATADIR is mounted. We use this to determine the source mount path for container metadata files in the case the ECS Agent is running as a container. We do not use this value in Windows because the ECS Agent is not running as container in Windows. | `/var/lib/ecs` | `Not used` |
| `ECS_ENABLE_TASK_CPU_MEM_LIMIT` | `true` | Whether to enable task-level cpu and memory limits | `true` | `false` |
| `ECS_CGROUP_MEMORY_SUBSYSTEM_PATH` | `/sys/fs/cgroup/memory/` | The root path of the cgroup memory subsystem | `/sys/fs/cgroup/memory/` | Not applicable |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the path on the host or in the agent mount? We should make that more clear here as well.

@@ -145,6 +145,9 @@ func newAgent(
metadataManager = containermetadata.NewManager(dockerClient, cfg)
}

resource := resources.New()
resource.ApplyConfigDependencies(cfg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be happening only if cgroup feature is enabled?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That logic is within resources, I believe.

The idea was that 'resources' could hold on to more things than just cgroups.

Copy link
Contributor Author

@adnxn adnxn Jan 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i can put it after the agent.cfg.TaskCPUMemLimit.Enabled check

// echo 1 > memory.use_hierarchy
memorySubsystem := c.memorySubsystemPath

err = c.ioutil.WriteFile(filepath.Join(memorySubsystem, cgroupRoot, memoryUseHierarchy), []byte(strconv.Itoa(1)), os.FileMode(0))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: can you split this into multiple lines?

// echo 1 > memory.use_hierarchy
memorySubsystem := c.memorySubsystemPath

err = c.ioutil.WriteFile(filepath.Join(memorySubsystem, cgroupRoot, memoryUseHierarchy), []byte(strconv.Itoa(1)), os.FileMode(0))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we do this for each task? Isn't it enough if we did this once at the top level ecs cgroup?

Copy link
Contributor

@petderek petderek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

README.md Outdated
@@ -176,6 +176,7 @@ additional details on each available environment variable.
| `ECS_ENABLE_CONTAINER_METADATA` | `true` | When `true`, the agent will create a file describing the container's metadata and the file can be located and consumed by using the container enviornment variable `$ECS_CONTAINER_METADATA_FILE` | `false` | `false` |
| `ECS_HOST_DATA_DIR` | `/var/lib/ecs` | The source directory on the host from which ECS_DATADIR is mounted. We use this to determine the source mount path for container metadata files in the case the ECS Agent is running as a container. We do not use this value in Windows because the ECS Agent is not running as container in Windows. | `/var/lib/ecs` | `Not used` |
| `ECS_ENABLE_TASK_CPU_MEM_LIMIT` | `true` | Whether to enable task-level cpu and memory limits | `true` | `false` |
| `ECS_CGROUP_MEMORY_SUBSYSTEM_PATH` | `/sys/fs/cgroup/memory/` | The root path of the cgroup memory subsystem | `/sys/fs/cgroup/memory/` | Not applicable |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer ECS_CGROUP_PATH and not include the memory subsystem at this level.

It makes it more open for reuse, and I think its important given that it is a configuration variable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this sounds reasonable.

@@ -145,6 +145,9 @@ func newAgent(
metadataManager = containermetadata.NewManager(dockerClient, cfg)
}

resource := resources.New()
resource.ApplyConfigDependencies(cfg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That logic is within resources, I believe.

The idea was that 'resources' could hold on to more things than just cgroups.

Copy link
Contributor

@aaithal aaithal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code lgtm. I have stylistic and lint comments.

@@ -194,4 +194,6 @@ type Config struct {
// OverrideAWSLogsExecutionRole is config option used to enable awslogs
// driver authentication over the task's execution role
OverrideAWSLogsExecutionRole bool

CgroupPath string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a comment

@@ -26,4 +27,6 @@ type Resource interface {
Setup(task *api.Task) error
// Cleanup removes the resource
Cleanup(task *api.Task) error

ApplyConfigDependencies(cfg *config.Config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a comment

@@ -159,7 +160,7 @@ func TestFullUpdateFlow(t *testing.T) {

require.Equal(t, "update-tar-data", writtenFile.String(), "incorrect data written")

u.performUpdateHandler(statemanager.NewNoopStateManager(), engine.NewTaskEngine(cfg, nil, nil, nil, nil, nil, nil))(&ecsacs.PerformUpdateMessage{
u.performUpdateHandler(statemanager.NewNoopStateManager(), engine.NewTaskEngine(cfg, nil, nil, nil, nil, nil, nil, nil))(&ecsacs.PerformUpdateMessage{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also split this and other long lines as well?

Copy link
Contributor

@petderek petderek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor changes. FFTI the formatting comment.

@@ -105,7 +105,8 @@ func TestPerformUpdateWithUpdatesDisabled(t *testing.T) {
Reason: ptr("Updates are disabled").(*string),
}})

u.performUpdateHandler(statemanager.NewNoopStateManager(), engine.NewTaskEngine(cfg, nil, nil, nil, nil, nil, nil, nil))(&ecsacs.PerformUpdateMessage{
u.performUpdateHandler(statemanager.NewNoopStateManager(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are trying to do a little too much in one line, even with line breaks. Maybe assign some variables above?

@@ -30,7 +30,7 @@ const (

// Default cgroup memory system root path, this is the default used if the
// path has not been configured through ECS_CGROUP_MEMORY_SUBSYSTEM_PATH
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change the comment

@adnxn adnxn force-pushed the config-cgroup-root branch from 9523667 to faa63ba Compare January 25, 2018 00:53
Copy link

@richardpen richardpen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code LGTM, some minor comments. Also please make sure the header of the modified file is updated to 2018.

func (c *cgroupWrapper) ApplyConfigDependencies(cfg *config.Config) {
c.cgroupPath = cfg.CgroupPath
}

// cgroupInit is used to create the root '/ecs/ cgroup

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update this comment?

@@ -98,6 +98,17 @@ func TestOOMContainer(t *testing.T) {
assert.NoError(t, err)
}

// TestOOMTask verifies that a task with a memory limit returns an error

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the agent.RequireVersion(">=1.16.0") for this test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@@ -194,4 +194,7 @@ type Config struct {
// OverrideAWSLogsExecutionRole is config option used to enable awslogs
// driver authentication over the task's execution role
OverrideAWSLogsExecutionRole bool

// The expected Cgroup root path in the agent

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the format CgroupPath is xxx.

@@ -39,13 +42,18 @@ func TestInitHappyPath(t *testing.T) {
defer ctrl.Finish()

mockControl := mock_cgroup.NewMockControl(ctrl)
mockIO := mock_ioutilwrapper.NewMockIOUtil(ctrl)
cgroupPath := fmt.Sprintf("/sys/fs/cgroup/memory/ecs/memory.use_hierarchy")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need the fmt.Sprintf here.

@adnxn adnxn force-pushed the config-cgroup-root branch 2 times, most recently from 44bdd17 to 4e7b2ad Compare January 25, 2018 21:33
Copy link
Contributor

@aaithal aaithal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, i have some minor comments.

@@ -194,4 +194,7 @@ type Config struct {
// OverrideAWSLogsExecutionRole is config option used to enable awslogs
// driver authentication over the task's execution role
OverrideAWSLogsExecutionRole bool

// The expected Cgroup path in the agent is /sys/fs/cgroup
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lint: CgroupPath is ...

// echo 1 > memory.use_hierarchy
cgroupRoot := c.cgroupPath

err = c.ioutil.WriteFile(filepath.Join(cgroupRoot, memorySubsystem, config.DefaultTaskCgroupPrefix, memoryUseHierarchy), []byte(strconv.Itoa(1)), os.FileMode(0))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please split this into multiple lines. Also, please add documentation for all those magic numbers:

  • []byte(strconv.Itoa(1)
  • os.FileMode(0)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops missed this one for splits, and will doc numbers.

@adnxn adnxn force-pushed the config-cgroup-root branch from 4e7b2ad to cc2a352 Compare January 25, 2018 23:47
@adnxn adnxn force-pushed the config-cgroup-root branch 4 times, most recently from dfe8ba9 to 5ed88ec Compare January 30, 2018 20:51
@@ -170,6 +170,8 @@ func TestDoStartTaskLimitsFail(t *testing.T) {
saveableOptionFactory.EXPECT().AddSaveable(gomock.Any(), gomock.Any()).AnyTimes(),
stateManagerFactory.EXPECT().NewStateManager(gomock.Any(), gomock.Any(), gomock.Any(),
gomock.Any(), gomock.Any()).Return(statemanager.NewNoopStateManager(), nil),

resource.EXPECT().ApplyConfigDependencies(gomock.Any()).AnyTimes(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make this more strict? MinTimes() if the exact number of times is hard to predict?

return NewDockerTaskEngine(cfg, client, credentialsManager, containerChangeEventStream, imageManager, state, metadataManager)
metadataManager containermetadata.Manager,
resource resources.Resource) TaskEngine {
return NewDockerTaskEngine(cfg, client, credentialsManager, containerChangeEventStream, imageManager, state, metadataManager, resource)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can probably split this into multiple lines as well

@@ -93,6 +113,12 @@ func (c *cgroupWrapper) setupCgroup(task *api.Task) error {
return errors.Wrapf(err, "resource: setup cgroup: unable to create cgroup at %s for task: %s", cgroupRoot, task.Arn)
}

// echo 1 > memory.use_hierarchy
err = c.ioutil.WriteFile(filepath.Join(c.cgroupPath, memorySubsystem, cgroupRoot, memoryUseHierarchy), []byte(strconv.Itoa(1)), os.FileMode(0))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this too, please split this into multiple lines

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do i keep missing this one. FIXED 😬

Copy link
Contributor

@aaithal aaithal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just have that testing comment. Otherwise, lgtm

@adnxn adnxn force-pushed the config-cgroup-root branch 2 times, most recently from f653300 to 4d84c8f Compare January 30, 2018 22:57
@@ -93,6 +113,13 @@ func (c *cgroupWrapper) setupCgroup(task *api.Task) error {
return errors.Wrapf(err, "resource: setup cgroup: unable to create cgroup at %s for task: %s", cgroupRoot, task.Arn)
}

// echo 1 > memory.use_hierarchy
memoryHierarchyPath := filepath.Join(c.cgroupPath, memorySubsystem, cgroupRoot, memoryUseHierarchy)
err = c.ioutil.WriteFile(memoryHierarchyPath, []byte(strconv.Itoa(1)), os.FileMode(0))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are still magic numbers. you should be able to const both of these with descriptive names. sorry for missing this in the last round.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah it's fine. i made this change earlier and i think it got lost while trying to revert back to per task memory.use_hierarchy

@adnxn adnxn force-pushed the config-cgroup-root branch from 4d84c8f to caa861b Compare January 30, 2018 23:11
}

// cgroupInit is used to create the root '/ecs/ cgroup and enable
// memory.use_hierarchy at the '/ecs/' level
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you edit this comment to reflect the new changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch.

@@ -93,6 +114,13 @@ func (c *cgroupWrapper) setupCgroup(task *api.Task) error {
return errors.Wrapf(err, "resource: setup cgroup: unable to create cgroup at %s for task: %s", cgroupRoot, task.Arn)
}

// echo 1 > memory.use_hierarchy
memoryHierarchyPath := filepath.Join(c.cgroupPath, memorySubsystem, cgroupRoot, memoryUseHierarchy)
err = c.ioutil.WriteFile(memoryHierarchyPath, []byte(strconv.Itoa(setMemoryHierarchy)), os.FileMode(rootReadOnly))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for not being more clear about this. I think you can do something like to make it more readable:

const (
	rootReadOnlyPermissions = os.FileMode(400)
)

var (
	enableMemoryHierarchy = []byte(strconv.Itoa(1))
)

@adnxn adnxn force-pushed the config-cgroup-root branch 2 times, most recently from 7f1c65d to cd79407 Compare January 31, 2018 01:06
since we "Emit a couple of events for the task before cleanup finishes."
We should also add corresponding client.EXPECT().StopContainer()
@adnxn adnxn force-pushed the config-cgroup-root branch from cd79407 to 0c883fa Compare January 31, 2018 18:11
@adnxn adnxn merged commit 27666f6 into aws:dev Jan 31, 2018
@adnxn adnxn deleted the config-cgroup-root branch June 4, 2018 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants