You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Just encountered a gnarly issue. The symptoms were that we have lots of queued jobs and the scale up alarm was firing, but our stack wasn't scaling up. There was a single instance sitting waiting for it's lifecycle hooks to finish.
When I logged into the instance, it looked like a network partition had made SQS inaccessible so lifecycled thrashed until terminating:
Sep 12 23:03:58 ip-10-0-1-244 lifecycled: time="2017-09-12T23:03:58Z" level=info msg="Looking up instance id from metadata service"
Sep 12 23:04:14 ip-10-0-1-244 lifecycled: time="2017-09-12T23:04:14Z" level=info msg="Listening for lifecycle notifications"
Sep 13 00:29:04 ip-10-0-1-244 lifecycled: time="2017-09-13T00:29:04Z" level=info msg="Failed to query metadata service" error="Get http://169.254.169.254/latest/meta-data/spot/termination-time: dial tcp 169.254.169.254:80: socket: too many open files"
Sep 13 00:29:05 ip-10-0-1-244 lifecycled: lifecycled: error: RequestError: send request failed
Sep 13 00:29:05 ip-10-0-1-244 lifecycled: caused by: Post https://sqs.us-east-1.amazonaws.com/: dial tcp: lookup sqs.us-east-1.amazonaws.com on 10.0.0.2:53: no such host, try --help
Sep 13 00:29:05 ip-10-0-1-244 init: lifecycled main process (2898) terminated with status 1
Sep 13 00:29:05 ip-10-0-1-244 init: lifecycled main process ended, respawning
Sep 13 00:29:05 ip-10-0-1-244 lifecycled: time="2017-09-13T00:29:05Z" level=info msg="Looking up instance id from metadata service"
Sep 13 00:29:05 ip-10-0-1-244 lifecycled: time="2017-09-13T00:29:05Z" level=info msg="Listening for lifecycle notifications"
Sep 13 00:44:08 ip-10-0-1-244 lifecycled: lifecycled: error: InternalError: We encountered an internal error. Please try again.
Sep 13 00:44:08 ip-10-0-1-244 lifecycled: #011status code: 500, request id: 2a4d207a-c108-5ebd-b980-724840885298, try --help
Sep 13 00:44:08 ip-10-0-1-244 init: lifecycled main process (7872) terminated with status 1
Sep 13 00:44:08 ip-10-0-1-244 init: lifecycled main process ended, respawning
Sep 13 00:44:08 ip-10-0-1-244 lifecycled: time="2017-09-13T00:44:08Z" level=info msg="Looking up instance id from metadata service"
Sep 13 00:44:08 ip-10-0-1-244 lifecycled: time="2017-09-13T00:44:08Z" level=fatal msg="AWS.SimpleQueueService.QueueDeletedRecently: You must wait 60 seconds after deleting a queue before you can create another with the same name.\n\tstatus code: 400, request id: 50aa8bc3-98cc-5cfd-8bd0-d9f119545ab9"
Sep 13 00:44:08 ip-10-0-1-244 init: lifecycled main process (8189) terminated with status 1
Sep 13 00:44:08 ip-10-0-1-244 init: lifecycled main process ended, respawning
Sep 13 00:44:08 ip-10-0-1-244 lifecycled: time="2017-09-13T00:44:08Z" level=info msg="Looking up instance id from metadata service"
Sep 13 00:44:08 ip-10-0-1-244 lifecycled: time="2017-09-13T00:44:08Z" level=fatal msg="AWS.SimpleQueueService.QueueDeletedRecently: You must wait 60 seconds after deleting a queue before you can create another with the same name.\n\tstatus code: 400, request id: 657ba801-09b6-567f-bd10-fa33882ecb1a"
Sep 13 00:44:08 ip-10-0-1-244 init: lifecycled main process (8200) terminated with status 1
Sep 13 00:44:08 ip-10-0-1-244 init: lifecycled main process ended, respawning
Sep 13 00:44:08 ip-10-0-1-244 lifecycled: time="2017-09-13T00:44:08Z" level=info msg="Looking up instance id from metadata service"
Sep 13 00:44:08 ip-10-0-1-244 lifecycled: time="2017-09-13T00:44:08Z" level=fatal msg="AWS.SimpleQueueService.QueueDeletedRecently: You must wait 60 seconds after deleting a queue before you can create another with the same name.\n\tstatus code: 400, request id: e8ab9033-a92f-5b5a-a006-c20128b73523"
Sep 13 00:44:08 ip-10-0-1-244 init: lifecycled main process (8211) terminated with status 1
Sep 13 00:44:08 ip-10-0-1-244 init: lifecycled main process ended, respawning
Sep 13 00:44:08 ip-10-0-1-244 lifecycled: time="2017-09-13T00:44:08Z" level=info msg="Looking up instance id from metadata service"
Sep 13 00:44:08 ip-10-0-1-244 lifecycled: time="2017-09-13T00:44:08Z" level=fatal msg="AWS.SimpleQueueService.QueueDeletedRecently: You must wait 60 seconds after deleting a queue before you can create another with the same name.\n\tstatus code: 400, request id: 9a1103c2-558e-52fe-81d1-51f0d416306e"
Sep 13 00:44:08 ip-10-0-1-244 init: lifecycled main process (8222) terminated with status 1
Sep 13 00:44:08 ip-10-0-1-244 init: lifecycled main process ended, respawning
Sep 13 00:44:08 ip-10-0-1-244 lifecycled: time="2017-09-13T00:44:08Z" level=info msg="Looking up instance id from metadata service"
Sep 13 00:44:08 ip-10-0-1-244 lifecycled: time="2017-09-13T00:44:08Z" level=fatal msg="AWS.SimpleQueueService.QueueDeletedRecently: You must wait 60 seconds after deleting a queue before you can create another with the same name.\n\tstatus code: 400, request id: 6ec31b6e-20cc-5e66-b234-495ca15424f9"
Sep 13 00:44:08 ip-10-0-1-244 init: lifecycled main process (8233) terminated with status 1
Sep 13 00:44:08 ip-10-0-1-244 init: lifecycled main process ended, respawning
Sep 13 00:44:08 ip-10-0-1-244 lifecycled: time="2017-09-13T00:44:08Z" level=info msg="Looking up instance id from metadata service"
Sep 13 00:44:08 ip-10-0-1-244 lifecycled: time="2017-09-13T00:44:08Z" level=fatal msg="AWS.SimpleQueueService.QueueDeletedRecently: You must wait 60 seconds after deleting a queue before you can create another with the same name.\n\tstatus code: 400, request id: 14c60a9e-1f48-580c-9652-58ac9ed0c8cf"
Sep 13 00:44:08 ip-10-0-1-244 init: lifecycled main process (8244) terminated with status 1
Sep 13 00:44:08 ip-10-0-1-244 init: lifecycled main process ended, respawning
Sep 13 00:44:09 ip-10-0-1-244 lifecycled: time="2017-09-13T00:44:09Z" level=info msg="Looking up instance id from metadata service"
Sep 13 00:44:09 ip-10-0-1-244 lifecycled: time="2017-09-13T00:44:09Z" level=fatal msg="AWS.SimpleQueueService.QueueDeletedRecently: You must wait 60 seconds after deleting a queue before you can create another with the same name.\n\tstatus code: 400, request id: 0c3a07ac-28be-53e5-af52-559ea927b5e3"
Sep 13 00:44:09 ip-10-0-1-244 init: lifecycled main process (8255) terminated with status 1
Sep 13 00:44:09 ip-10-0-1-244 init: lifecycled main process ended, respawning
Sep 13 00:44:09 ip-10-0-1-244 lifecycled: time="2017-09-13T00:44:09Z" level=info msg="Looking up instance id from metadata service"
Sep 13 00:44:09 ip-10-0-1-244 lifecycled: time="2017-09-13T00:44:09Z" level=fatal msg="AWS.SimpleQueueService.QueueDeletedRecently: You must wait 60 seconds after deleting a queue before you can create another with the same name.\n\tstatus code: 400, request id: 894cb688-d7f1-5b32-8230-e4ba5668ff61"
Sep 13 00:44:09 ip-10-0-1-244 init: lifecycled main process (8266) terminated with status 1
Sep 13 00:44:09 ip-10-0-1-244 init: lifecycled main process ended, respawning
Sep 13 00:44:09 ip-10-0-1-244 lifecycled: time="2017-09-13T00:44:09Z" level=info msg="Looking up instance id from metadata service"
Sep 13 00:44:09 ip-10-0-1-244 lifecycled: time="2017-09-13T00:44:09Z" level=fatal msg="AWS.SimpleQueueService.QueueDeletedRecently: You must wait 60 seconds after deleting a queue before you can create another with the same name.\n\tstatus code: 400, request id: 9eaaf3c3-df2d-524b-8087-a1f15eb46a57"
Sep 13 00:44:09 ip-10-0-1-244 init: lifecycled main process (8277) terminated with status 1
Sep 13 00:44:09 ip-10-0-1-244 init: lifecycled main process ended, respawning
Sep 13 00:44:09 ip-10-0-1-244 lifecycled: time="2017-09-13T00:44:09Z" level=info msg="Looking up instance id from metadata service"
Sep 13 00:44:09 ip-10-0-1-244 lifecycled: time="2017-09-13T00:44:09Z" level=fatal msg="AWS.SimpleQueueService.QueueDeletedRecently: You must wait 60 seconds after deleting a queue before you can create another with the same name.\n\tstatus code: 400, request id: fdd2e998-a978-5ecf-9ccc-bbaa28e1b573"
Sep 13 00:44:09 ip-10-0-1-244 init: lifecycled main process (8288) terminated with status 1
Sep 13 00:44:09 ip-10-0-1-244 init: lifecycled respawning too fast, stopped
The text was updated successfully, but these errors were encountered:
Just encountered a gnarly issue. The symptoms were that we have lots of queued jobs and the scale up alarm was firing, but our stack wasn't scaling up. There was a single instance sitting waiting for it's lifecycle hooks to finish.
When I logged into the instance, it looked like a network partition had made SQS inaccessible so lifecycled thrashed until terminating:
The text was updated successfully, but these errors were encountered: