Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add file watcher for Appnet agent image update #3435

Merged
merged 1 commit into from
Oct 19, 2022

Conversation

mythri-garaga
Copy link
Contributor

@mythri-garaga mythri-garaga commented Oct 17, 2022

Summary

This PR adds a file watcher for filepath (/var/lib/ecs/deps/serviceconnenct) where ECS Agent loads Appnet agent image from. With these changes, if there any update current Appnet agent image or a new Appnet agent image version becomes available, the file watcher invokes to reload this image and restart the internal instance relay with the updated image.

Update go.mod and vendor files to include fsnotify package.

Implementation details

Testing

make test

Also for manual testing purpose, made changes to Agent to include v2 as supported Appnet version and tested the following scenarios

  1. ECS Agent is running and has ecs-service-connect-agent:interface-v1 appnet image loaded. There are no Service Connect task running on the instance. Placed the Appnet v2 - ecs-service-connect-agent.interface-v2.tar tarball in the service connect deps dir. The ECS Agent agent reloaded the image ecs-service-connect-agent:v2.
  2. Tested scenario 1 with Service Connect task running in bridge mode. Observed that ECS Agent reloaded the new Appnet image and restart the instance relay task with ecs-service-connect-agent:v2
  3. With Scenario 2, that is ECS Agent has reloaded the new Appnet image and restarted the instance relay task with ecs-service-connect-agent:v2 but SC task's Appnet agent is still running with ecs-service-connect-agent:v1 image.
    Stopped this SC task and started a new one with same task def. The new SC task started successfully and it's Appnet agent is still running with ecs-service-connect-agent:v2 image.

New tests cover the changes: yes

Description for the changelog

Licensing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@mythri-garaga mythri-garaga force-pushed the restart-relay branch 6 times, most recently from 4e49369 to 768ef2d Compare October 17, 2022 22:09
@mythri-garaga mythri-garaga marked this pull request as ready for review October 17, 2022 22:09
@mythri-garaga mythri-garaga requested a review from a team as a code owner October 17, 2022 22:09
Copy link
Contributor

@yinyic yinyic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we verified (manually) that this works? If so could you include that in the description?

Update - thanks for adding the manual test scenarios!

agent/engine/docker_task_engine_linux.go Outdated Show resolved Hide resolved
agent/engine/docker_task_engine_linux.go Outdated Show resolved Hide resolved
agent/engine/docker_task_engine_linux.go Outdated Show resolved Hide resolved
agent/engine/docker_task_engine_linux.go Outdated Show resolved Hide resolved
@mythri-garaga mythri-garaga force-pushed the restart-relay branch 5 times, most recently from 3624222 to 2287f64 Compare October 18, 2022 04:54
@mythri-garaga mythri-garaga changed the base branch from feature/service-connect to dev October 18, 2022 17:29
agent/engine/docker_task_engine_linux.go Outdated Show resolved Hide resolved
agent/engine/docker_task_engine_linux.go Outdated Show resolved Hide resolved
agent/engine/docker_task_engine_linux.go Show resolved Hide resolved
agent/engine/docker_task_engine_linux_test.go Outdated Show resolved Hide resolved
@mythri-garaga mythri-garaga changed the base branch from dev to feature/service-connect October 18, 2022 17:29
@mythri-garaga mythri-garaga force-pushed the restart-relay branch 2 times, most recently from bfcfd50 to adfbc11 Compare October 18, 2022 19:21
@mythri-garaga mythri-garaga changed the base branch from feature/service-connect to dev October 18, 2022 20:48
@mythri-garaga mythri-garaga changed the base branch from dev to feature/service-connect October 18, 2022 20:51
Comment on lines 51 to 55
seelog.Errorf("error adding %s to fsnotify watcher, error: %v", filepath, err)
}
err = watcher.Add(filepath)
if err != nil {
seelog.Errorf("failed to initialize fsnotify NewWatcher, error: %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed that the two error messages should probably be swapped :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching that! just updated:)

"time"

"github.com/cihub/seelog"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to use "github.com/aws/amazon-ecs-agent/agent/logger" instead of seelog?

@@ -109,6 +109,8 @@ const (
stopContainerBackoffJitter = 0.2
stopContainerBackoffMultiplier = 1.3
stopContainerMaxRetryCount = 5

serviceConnectAppnetAgenTarballDir = "/var/lib/ecs/deps/serviceconnect/"
Copy link
Contributor

@yinyic yinyic Oct 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just occurred to me that the /var/lib/ecs/deps/ is the host directory, and is mounted into ECS Agent container as /managed/agents/ (see the bind mount configuration here)

I thought the watcher should be created on the container path (because the exact host path does not exist in Agent container technically), but are we observing expected behaviors even when the watcher is created with the host path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, watching container path is appropriate here, made changes in the latest rev accordingly.

return
}
const writeOrCreateMask = fsnotify.Write | fsnotify.Create
if event.Op&writeOrCreateMask != 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should there be a space between event.Op and writeOrCreateMask ? Also, can you explain what this means?

Copy link
Contributor Author

@mythri-garaga mythri-garaga Oct 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added space but I think gofmt removes it.

Also add a comment to clarify the condition we check:)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Mythri :)

func (engine *DockerTaskEngine) watchAppNetImage(ctx context.Context) {
watcher, err := fsnotify.NewWatcher()
if err != nil {
logger.Error(fmt.Sprintf("failed to initialize fsnotify NewWatcher, error: %v", err))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not blocking. we can use logger fields : )

		logger.Error("Failed to initialize fsnotify NewWatcher", logger.Fields{
			field.Error:     err,
		})

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops...will fix it in the next PR!! Thank you:)

engine.stopContainer(engine.serviceconnectRelay, container)
}
}
engine.serviceconnectRelay.SetDesiredStatus(apitaskstatus.TaskStopped)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q. Would we show the task is stopped/started due to AppNet Agent image reload somewhere in ECS Agent log? or Control plane will receive a message from ECS Agent, saying that task state is changed due to image is reloading?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this an internal task, we don't publish this task status to backend but just log the restart on ECS Agent.

if loadErr = loader.LoadFromFile(ctx, agentContainerTarballPath, dockerClient); loadErr != nil {
logger.Warn(fmt.Sprintf("Unable to load appnet agent container tarball: %s", agentContainerTarballPath),
logger.Warn(fmt.Sprintf("Unable to load Appnet agent container tarball: %s", agentContainerTarballPath),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@mythri-garaga mythri-garaga merged commit c0c801b into aws:dev Oct 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants