Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to start Service: failed to pull image amazon/amazon-ecs-pause:0.1.0 #1837

Closed
jessecollier opened this issue Feb 11, 2019 · 5 comments

Comments

@jessecollier
Copy link

jessecollier commented Feb 11, 2019

Summary

Trying to start a daemon service (datadog agent) on ec2 ecs. However task is being marked as "Stopped" with no information in the UI.

Description

Following the setup guide here: https://docs.datadoghq.com/integrations/amazon_ecs/?tab=python#aws-cli

When launching the new daemon service task, the tasks are unable to start. Here's the relevant log line which marks the service as STOPPED

2019-02-11T17:24:36Z [ERROR] Managed task [arn:aws:ecs:us-east-1:<acctid>:task/ecs-cluster-staging/<taskid>]: error while pulling image amazon/amazon-ecs-pause:0.1.0 for container ~internal~ecs~pause~namespace , moving task to STOPPED: Error response from daemon: pull access denied for amazon/amazon-ecs-pause, repository does not exist or may require 'docker login'

Expected Behavior

Service task starts

Observed Behavior

Marked as STOPPED for task id, however complains about an amazon-ecs-pause unable to be PULLED

Environment Details

Docker Info

Containers: 3
 Running: 3
 Paused: 0
 Stopped: 0
Images: 4
Server Version: 18.06.1-ce
Storage Driver: devicemapper
 Pool Name: docker-docker--pool
 Pool Blocksize: 524.3kB
 Base Device Size: 10.74GB
 Backing Filesystem: ext4
 Udev Sync Supported: true
 Data Space Used: 3.906GB
 Data Space Total: 530.4GB
 Data Space Available: 526.5GB
 Metadata Space Used: 18.22MB
 Metadata Space Total: 536.9MB
 Metadata Space Available: 518.7MB
 Thin Pool Minimum Free Space: 53.04GB
 Deferred Removal Enabled: true
 Deferred Deletion Enabled: true
 Deferred Deleted Device Count: 0
 Library Version: 1.02.135-RHEL7 (2016-11-16)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 468a545b9edcd5932818eb9de8e72413e616e86e
runc version: 69663f0bd4b60df09991c08812a60108003fa340
init version: fec3683
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.14.97-74.72.amzn1.x86_64
Operating System: Amazon Linux AMI 2018.03
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 15.1GiB
Name: ip-10-20-22-252
ID: AHYG:E5FR:KLEK:5P6C:JDHB:PK5A:RNYP:GWNO:37EV:W2F2:TIX2:RRYI
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
{"Cluster":"ecs-cluster-staging","ContainerInstanceArn":"arn:aws:ecs:us-east-1:<acct_id>:container-instance/<instance_id>","Version":"Amazon ECS Agent - v1.25.2 (0821fbc7)"}
[root@ip-10-20-22-252 ~]# docker images|grep ecs
amazon/amazon-ecs-agent                               latest              83062f8bc4d0        10 days ago         38.8MB
amazon/amazon-ecs-pause                               0.1.0               13d22fa69a05        10 days ago         954kB

Task Definitions:

{
  "ipcMode": null,
  "executionRoleArn": null,
  "containerDefinitions": [
    {
      "dnsSearchDomains": null,
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/us-east-1/ecs-cluster-staging/staging/datadog-agent",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "datadog"
        }
      },
      "entryPoint": null,
      "portMappings": [
        {
          "hostPort": 0,
          "protocol": "tcp",
          "containerPort": 8125
        },
        {
          "hostPort": 0,
          "protocol": "tcp",
          "containerPort": 8126
        }
      ],
      "command": null,
      "linuxParameters": null,
      "cpu": 10,
      "environment": [
        {
          "name": "cluster_name",
          "value": "ecs-cluster-staging"
        },
        {
          "name": "DD_AC_EXCLUDE",
          "value": "datadog-agent"
        },
        {
          "name": "DD_API_KEY",
          "value": "<api_key>"
        },
        {
          "name": "DD_APM_ENABLED",
          "value": "true"
        },
        {
          "name": "DD_APM_NON_LOCAL_TRAFFIC",
          "value": "true"
        },
        {
          "name": "DD_DOCKER_ENV_AS_TAGS",
          "value": "'{\"vpc_name\": \"vpc_name\",\"vpc_id\": \"vpc_id\",\"cluster_name\": \"cluster_name\",\"service_name\": \"service_name\",\"stage\": \"stage\",\"region\": \"region\" }'"
        },
        {
          "name": "DD_DOGSTATSD_NON_LOCAL_TRAFFIC",
          "value": "true"
        },
        {
          "name": "region",
          "value": "us-east-1"
        },
        {
          "name": "service_name",
          "value": "datadog-agent"
        },
        {
          "name": "stage",
          "value": "staging"
        },
        {
          "name": "vpc_id",
          "value": "vpc-10703e6b"
        },
        {
          "name": "vpc_name",
          "value": "staging"
        }
      ],
      "resourceRequirements": null,
      "ulimits": null,
      "dnsServers": null,
      "mountPoints": [
        {
          "readOnly": true,
          "containerPath": "/var/run/docker.sock",
          "sourceVolume": "docker_sock"
        },
        {
          "readOnly": true,
          "containerPath": "/host/sys/fs/cgroup",
          "sourceVolume": "cgroup"
        },
        {
          "readOnly": true,
          "containerPath": "/host/proc",
          "sourceVolume": "proc"
        },
        {
          "readOnly": true,
          "containerPath": "/etc/passwd",
          "sourceVolume": "passwd"
        }
      ],
      "workingDirectory": null,
      "secrets": null,
      "dockerSecurityOptions": null,
      "memory": 256,
      "memoryReservation": null,
      "volumesFrom": [],
      "image": "datadog/agent:latest",
      "disableNetworking": null,
      "interactive": null,
      "healthCheck": null,
      "essential": true,
      "links": null,
      "hostname": null,
      "extraHosts": null,
      "pseudoTerminal": null,
      "user": null,
      "readonlyRootFilesystem": null,
      "dockerLabels": null,
      "systemControls": null,
      "privileged": null,
      "name": "datadog-agent"
    }
  ],
  "placementConstraints": [],
  "memory": null,
  "taskRoleArn": "arn:aws:iam::<acct_id>:role/datadog-agent-us-east-1-staging-role",
  "compatibilities": [
    "EC2"
  ],
  "taskDefinitionArn": "arn:aws:ecs:us-east-1:<acct_id>:task-definition/datadog-agent:5",
  "family": "datadog-agent",
  "requiresAttributes": [
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "ecs.capability.pid-ipc-namespace-sharing"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "com.amazonaws.ecs.capability.task-iam-role"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "com.amazonaws.ecs.capability.logging-driver.awslogs"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "com.amazonaws.ecs.capability.docker-remote-api.1.24"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "com.amazonaws.ecs.capability.docker-remote-api.1.19"
    }
  ],
  "pidMode": "task",
  "requiresCompatibilities": [
    "EC2"
  ],
  "networkMode": "bridge",
  "cpu": null,
  "revision": 5,
  "status": "ACTIVE",
  "volumes": [
    {
      "name": "passwd",
      "host": {
        "sourcePath": "/etc/passwd"
      },
      "dockerVolumeConfiguration": null
    },
    {
      "name": "proc",
      "host": {
        "sourcePath": "/proc/"
      },
      "dockerVolumeConfiguration": null
    },
    {
      "name": "docker_sock",
      "host": {
        "sourcePath": "/var/run/docker.sock"
      },
      "dockerVolumeConfiguration": null
    },
    {
      "name": "cgroup",
      "host": {
        "sourcePath": "/cgroup/"
      },
      "dockerVolumeConfiguration": null
    }
  ]
}

Supporting Log Snippets

2019-02-11T17:24:14Z [INFO] Managed task [arn:aws:ecs:us-east-1:<acct_id>:task/ecs-cluster-staging/<task_id>]: unable to create task state change event []: create task state change event api: status not recognized by ECS: NONE
2019-02-11T17:24:14Z [INFO] Managed task [arn:aws:ecs:us-east-1:<acct_id>:task/ecs-cluster-staging/<task_id>]: waiting for any previous stops to complete. Sequence number: 7
2019-02-11T17:24:14Z [INFO] Managed task [arn:aws:ecs:us-east-1:<acct_id>:task/ecs-cluster-staging/<task_id>]: wait over; ready to move towards status: RUNNING
2019-02-11T17:24:14Z [INFO] Task engine [arn:aws:ecs:us-east-1:<acct_id>:task/ecs-cluster-staging/<task_id>]: pulling image amazon/amazon-ecs-pause:0.1.0 for container ~internal~ecs~pause~namespace concurrently
2019-02-11T17:24:14Z [INFO] Task engine [arn:aws:ecs:us-east-1:<acct_id>:task/ecs-cluster-staging/<task_id>]: recording timestamp for starting image pulltime: 2019-02-11 17:24:14.805784966 +0000 UTC m=+435.432612309
2019-02-11T17:24:14Z [INFO] Creating cgroup /ecs/ecs-cluster-staging/<task_id>
2019-02-11T17:24:36Z [ERROR] Task engine [arn:aws:ecs:us-east-1:<acct_id>:task/ecs-cluster-staging/<task_id>]: failed to pull image amazon/amazon-ecs-pause:0.1.0 for container ~internal~ecs~pause~namespace: Error response from daemon: pull access denied for amazon/amazon-ecs-pause, repository does not exist or may require 'docker login'
2019-02-11T17:24:36Z [INFO] Task engine [arn:aws:ecs:us-east-1:<acct_id>:task/ecs-cluster-staging/<task_id>]: error transitioning container [~internal~ecs~pause~namespace] to [PULLED]: Error response from daemon: pull access denied for amazon/amazon-ecs-pause, repository does not exist or may require 'docker login'
2019-02-11T17:24:36Z [ERROR] Managed task [arn:aws:ecs:us-east-1:<acct_id>:task/ecs-cluster-staging/<task_id>]: error while pulling image amazon/amazon-ecs-pause:0.1.0 for container ~internal~ecs~pause~namespace , moving task to STOPPED: Error response from daemon: pull access denied for amazon/amazon-ecs-pause, repository does not exist or may require 'docker login'
2019-02-11T17:24:36Z [INFO] Task [arn:aws:ecs:us-east-1:<acct_id>:task/ecs-cluster-staging/<task_id>]: recording execution stopped time. Essential container [~internal~ecs~pause~namespace] stopped at: 2019-02-11 17:24:36.193828546 +0000 UTC m=+456.820655905
2019-02-11T17:24:36Z [INFO] Managed task [arn:aws:ecs:us-east-1:<acct_id>:task/ecs-cluster-staging/<task_id>]: sending container change event [datadog-agent]: arn:aws:ecs:us-east-1:<acct_id>:task/ecs-cluster-staging/<task_id> datadog-agent -> STOPPED, Known Sent: NONE
2019-02-11T17:24:36Z [INFO] Managed task [arn:aws:ecs:us-east-1:<acct_id>:task/ecs-cluster-staging/<task_id>]: sent container change event [datadog-agent]: arn:aws:ecs:us-east-1:<acct_id>:task/ecs-cluster-staging/<task_id> datadog-agent -> STOPPED, Known Sent: NONE
2019-02-11T17:24:36Z [INFO] Managed task [arn:aws:ecs:us-east-1:<acct_id>:task/ecs-cluster-staging/<task_id>]: sending task change event [arn:aws:ecs:us-east-1:<acct_id>:task/ecs-cluster-staging/<task_id> -> STOPPED, Known Sent: NONE, PullStartedAt: 2019-02-11 17:24:14.805784966 +0000 UTC m=+435.432612309, PullStoppedAt: 2019-02-11 17:24:36.190041121 +0000 UTC m=+456.816868471, ExecutionStoppedAt: 2019-02-11 17:24:36.193828546 +0000 UTC m=+456.820655905]
2019-02-11T17:24:36Z [INFO] TaskHandler: batching container event: arn:aws:ecs:us-east-1:<acct_id>:task/ecs-cluster-staging/<task_id> datadog-agent -> STOPPED, Known Sent: NONE
2019-02-11T17:24:36Z [INFO] TaskHandler: Adding event: TaskChange: [arn:aws:ecs:us-east-1:<acct_id>:task/ecs-cluster-staging/<task_id> -> STOPPED, Known Sent: NONE, PullStartedAt: 2019-02-11 17:24:14.805784966 +0000 UTC m=+435.432612309, PullStoppedAt: 2019-02-11 17:24:36.190041121 +0000 UTC m=+456.816868471, ExecutionStoppedAt: 2019-02-11 17:24:36.193828546 +0000 UTC m=+456.820655905, arn:aws:ecs:us-east-1:<acct_id>:task/ecs-cluster-staging/<task_id> datadog-agent -> STOPPED, Known Sent: NONE] sent: false
@FernandoMiguel
Copy link

Ecs Pause is a new image that was recently added to allow ecs clusters to pause/hibernate, afaik.

Never seen it fail downloading on my infra.
Do you have any custom auth config?
Restricted access to the Internet?

@jessecollier
Copy link
Author

@FernandoMiguel I was able to identify the issue with AWS support. We had pull behavior set to once and were hitting issue here:

// If the agent pull behavior is always or once, we receive the error because

I resolved to the issue by updating ECS_IMAGE_PULL_BEHAVIOR back to default

@samuelkarp
Copy link
Contributor

@jessecollier amazon/amazon-ecs-pause:0.1.0 should probably be excluded from that conditional, as we never intend to pull the image. Reopening to track as this sounds like a bug to me.

Ecs Pause is a new image that was recently added to allow ecs clusters to pause/hibernate, afaik.

amazon/amazon-ecs-pause:0.1.0 is used as part of the awsvpc networking mode, and for sharing pid and ipc namespaces; it's a mechanism to create a namespace that can be shared by multiple containers and is very similar to Kubernetes' use of "pause" containers for pod namespaces. The image is bundled inside the main ECS agent image and should never need to be pulled; instead, the ECS agent will load it into Docker when starting.

If setting ECS_IMAGE_PULL_BEHAVIOR=once causes failures when using either awsvpc networking mode or shared pid or ipc namespaces, it sounds like it's getting incorrectly included in the pull behavior logic.

@samuelkarp samuelkarp reopened this Feb 11, 2019
@sharanyad
Copy link
Contributor

@samuelkarp you're right. The pause image failing here is being used for the pidMode.
CNI pause image is excluded from the image pull behavior check here

case apicontainer.ContainerCNIPause:

We need to add pause image for container namespaces as well.

@sharanyad
Copy link
Contributor

Closing this issue. Fixed with https://github.com/aws/amazon-ecs-agent/releases/tag/v1.25.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants