-
Notifications
You must be signed in to change notification settings - Fork 617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add file watcher for Appnet agent image update #3435
Conversation
4e49369
to
768ef2d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we verified (manually) that this works? If so could you include that in the description?
Update - thanks for adding the manual test scenarios!
3624222
to
2287f64
Compare
2287f64
to
2729417
Compare
bfcfd50
to
adfbc11
Compare
e11391b
to
858b2c9
Compare
adfbc11
to
0a1fd90
Compare
0a1fd90
to
739c6f5
Compare
seelog.Errorf("error adding %s to fsnotify watcher, error: %v", filepath, err) | ||
} | ||
err = watcher.Add(filepath) | ||
if err != nil { | ||
seelog.Errorf("failed to initialize fsnotify NewWatcher, error: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noticed that the two error messages should probably be swapped :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching that! just updated:)
739c6f5
to
178fcdc
Compare
"time" | ||
|
||
"github.com/cihub/seelog" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to use "github.com/aws/amazon-ecs-agent/agent/logger" instead of seelog?
agent/engine/docker_task_engine.go
Outdated
@@ -109,6 +109,8 @@ const ( | |||
stopContainerBackoffJitter = 0.2 | |||
stopContainerBackoffMultiplier = 1.3 | |||
stopContainerMaxRetryCount = 5 | |||
|
|||
serviceConnectAppnetAgenTarballDir = "/var/lib/ecs/deps/serviceconnect/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It just occurred to me that the /var/lib/ecs/deps/
is the host directory, and is mounted into ECS Agent container as /managed/agents/
(see the bind mount configuration here)
I thought the watcher should be created on the container path (because the exact host path does not exist in Agent container technically), but are we observing expected behaviors even when the watcher is created with the host path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, watching container path is appropriate here, made changes in the latest rev accordingly.
return | ||
} | ||
const writeOrCreateMask = fsnotify.Write | fsnotify.Create | ||
if event.Op&writeOrCreateMask != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should there be a space between event.Op and writeOrCreateMask ? Also, can you explain what this means?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added space but I think gofmt removes it.
Also add a comment to clarify the condition we check:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Mythri :)
d0ca83b
to
6996eb9
Compare
6996eb9
to
df1b7a0
Compare
func (engine *DockerTaskEngine) watchAppNetImage(ctx context.Context) { | ||
watcher, err := fsnotify.NewWatcher() | ||
if err != nil { | ||
logger.Error(fmt.Sprintf("failed to initialize fsnotify NewWatcher, error: %v", err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not blocking. we can use logger fields : )
logger.Error("Failed to initialize fsnotify NewWatcher", logger.Fields{
field.Error: err,
})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops...will fix it in the next PR!! Thank you:)
engine.stopContainer(engine.serviceconnectRelay, container) | ||
} | ||
} | ||
engine.serviceconnectRelay.SetDesiredStatus(apitaskstatus.TaskStopped) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q. Would we show the task is stopped/started due to AppNet Agent image reload somewhere in ECS Agent log? or Control plane will receive a message from ECS Agent, saying that task state is changed due to image is reloading?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this an internal task, we don't publish this task status to backend but just log the restart on ECS Agent.
if loadErr = loader.LoadFromFile(ctx, agentContainerTarballPath, dockerClient); loadErr != nil { | ||
logger.Warn(fmt.Sprintf("Unable to load appnet agent container tarball: %s", agentContainerTarballPath), | ||
logger.Warn(fmt.Sprintf("Unable to load Appnet agent container tarball: %s", agentContainerTarballPath), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Summary
This PR adds a file watcher for filepath (
/var/lib/ecs/deps/serviceconnenct
) where ECS Agent loads Appnet agent image from. With these changes, if there any update current Appnet agent image or a new Appnet agent image version becomes available, the file watcher invokes to reload this image and restart the internal instance relay with the updated image.Update go.mod and vendor files to include fsnotify package.
Implementation details
Testing
make test
Also for manual testing purpose, made changes to Agent to include
v2
as supported Appnet version and tested the following scenariosecs-service-connect-agent:interface-v1
appnet image loaded. There are no Service Connect task running on the instance. Placed the Appnet v2 -ecs-service-connect-agent.interface-v2.tar
tarball in the service connect deps dir. The ECS Agent agent reloaded the imageecs-service-connect-agent:v2
.ecs-service-connect-agent:v2
ecs-service-connect-agent:v2
but SC task's Appnet agent is still running withecs-service-connect-agent:v1
image.Stopped this SC task and started a new one with same task def. The new SC task started successfully and it's Appnet agent is still running with
ecs-service-connect-agent:v2
image.New tests cover the changes: yes
Description for the changelog
Licensing
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.