Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AWS IMDSv2 mode support for ignition and afterburn #220

Closed
mdaniel opened this issue Oct 27, 2020 · 15 comments
Closed

Add AWS IMDSv2 mode support for ignition and afterburn #220

mdaniel opened this issue Oct 27, 2020 · 15 comments

Comments

@mdaniel
Copy link

mdaniel commented Oct 27, 2020

Description

Booting Flatcar Stable (075585003325/Flatcar-stable-2605.6.0-hvm) (or even Alpha 075585003325/Flatcar-alpha-2661.0.0-hvm) in AWS on an instance with IMDSv2 turned on causes coreos-metadata (or its ignition friend) to exit, which causes #cloud-config to not execute, potentially locking the user out of the instance

Impact

Instances enter "Emergency Mode", which means no ssh access, nor SSM, and generally the instance is orphaned

Environment and steps to reproduce

  1. Set-up: please see below

  2. Task: booting

  3. Action(s):

    1. boot instance
    2. observe one cannot connect to it
  4. Error: [describe the error that was triggered]

     [  147.653951] ignition[5873]: Ignition v0.34.0-22-g032f620
     ...snip...
     [  147.844754] ignition[5873]: no config at "/usr/lib/ignition/user.ign"
     [  147.851439] systemd[1]: Reached target Host and Network Name Lookups.
     [  147.872665] ignition[5873]: GET http://169.254.169.254/2009-04-04/user-data: attempt #1
     [  147.885479] systemd[1]: Starting Ignition (disks)...
     [  147.932668] ignition[5873]: GET result: Unauthorized
     [  147.937613] systemd[1]: ignition-disks.service: Main process exited, code=exited, status=1/FAILURE
     [  147.949266] ignition[5873]: failed to fetch config: failed to fetch resource
     [  147.958620] systemd[1]: ignition-disks.service: Failed with result 'exit-code'.
     [  148.008043] ignition[5873]: failed to acquire config: failed to fetch resource
     [  148.017145] systemd[1]: Failed to start Ignition (disks).
     [  148.022175] ignition[5873]: Ignition failed: failed to fetch resource
    

Expected behavior

The instance should honor the EC2 KeyName parameter, and/or the ssh_authorized_keys: [], and/or run the units: specified in #cloud-config

Additional information

The docs point to container-linux-config-transpiler whose repo is marched "archived," but the supported data by provider page cites the coreos-metadata repo (which is the process in journalctl on stable that fails, not ignition) but that repo redirects to coreos/afterburn which, helpfully, does seem to support IMDSv2

As an aside, I actually can't tell if this is my fault for attempting to use practically a standard or if it's because I haven't run my UserData through some kind of yaml-to-json compiler first. It's similarly confusing to have docs that say do not use #cloud-config but the reference CloudFormation stack still says #cloud-config

Reproduction Steps

I took the suggested stack, used AWS's "hello world VPC" as a substack just to boot up an instance with IMDSv2 to demonstrate what's going on; the interesting bits are:

aws cloudformation create-stack --template-body "$(cat flatcar-alpha-hvm.yaml)" --parameters ParameterKey=KeyPair,ParameterValue=$USER --stack-name flatcar-alpha-hvm
aws cloudformation wait stack-create-complete --stack-name flatcar-alpha-hvm
read -p 'instance-id? ' I_ID
while true; do
    aws ec2 get-console-output --instance-id $I_ID | tee flatcar-alpha-hvm.console.log
    if grep -q Output flatcar-alpha-hvm.console.log; then break; fi
    echo '30s...'
    sleep 30
done

where flatcar-alpha-hvm.yaml is:

AWSTemplateFormatVersion: '2010-09-09'
Description: 'Flatcar Linux on EC2: https://docs.flatcar-linux.org/os/booting-on-ec2/'
Parameters:
  InstanceType:
    Description: EC2 HVM instance type (m3.medium, etc).
    Type: String
    Default: m3.medium
    ConstraintDescription: Must be a valid EC2 HVM instance type.
  KeyPair:
    Description: The name of an EC2 Key Pair to allow SSH access to the instance.
      but it doesn't matter, since this stack is to demonstrate the console error
    # Type: AWS::EC2::KeyPair::KeyName
    Type: String

Resources:
  FlatcarServerLT:
    # one must use an LT to set `MetadataOptions:`
    Type: AWS::EC2::LaunchTemplate
    Properties:
      LaunchTemplateData:
        MetadataOptions:
          # THIS IS THE MAGIC SAUCE
          HttpTokens: required
          #/THIS IS THE MAGIC SAUCE

# --- everything below this line isn't important ---
        ImageId: !FindInMap
          - RegionMap
          - Ref: 'AWS::Region'
          - AMI
        InstanceType: !Ref InstanceType
        KeyName: !Ref KeyPair
        UserData:
          # this can be anything
          Fn::Base64: |
            #cloud-config
            write_files:
            - path: /root/boot0.sh
              permissions: "0755"
              content: |
                #! /usr/bin/env bash
                echo hello world

  VpcStack:
    DependsOn:
    # put this in flight early to avoid a CFN race condition
    - FlatcarServerLT
    Type: AWS::CloudFormation::Stack
    Properties:
      TemplateURL: https://s3.amazonaws.com/cloudformation-templates-us-east-1/VPC_With_PublicIPs_And_DNS.template
      Parameters:
        KeyName: !Ref KeyPair
      # Outputs: [ PublicSubnet, VPCId ]

  FlatcarSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Flatcar Linux SecurityGroup
      SecurityGroupIngress:
        - IpProtocol: tcp
          FromPort: 22
          ToPort: 22
          CidrIp: 0.0.0.0/0
      VpcId: !GetAtt [VpcStack, Outputs.VPCId]

  FlatcarServer:
    Type: AWS::EC2::Instance
    Properties:
      LaunchTemplate:
          LaunchTemplateId: !Ref FlatcarServerLT
          Version: !GetAtt [FlatcarServerLT, LatestVersionNumber]
      SecurityGroupIds:
        - Ref: FlatcarSecurityGroup
      SubnetId: !GetAtt [VpcStack, Outputs.PublicSubnet]

Mappings:
  RegionMap:
    eu-central-1:
      AMI: ami-0d748198043c0c255
    ap-northeast-1:
      AMI: ami-0ed3e6682b9c4d495
    ap-northeast-2:
      AMI: ami-066c23653a0c45465
    ca-central-1:
      AMI: ami-0308d875f0b87b531
    ap-south-1:
      AMI: ami-0cd30a2e85af98cca
    sa-east-1:
      AMI: ami-04db15e4f88a7504c
    ap-southeast-2:
      AMI: ami-0d83f60257e42437b
    ap-southeast-1:
      AMI: ami-0174e8bc61f1d1907
    us-east-1:
      AMI: ami-0c7fa97342e18ae64
    us-east-2:
      AMI: ami-0c5acb9322e623e4e
    us-west-2:
      AMI: ami-0544e9163376d21a2
    us-west-1:
      AMI: ami-0d8cea3887521a7e3
    eu-west-1:
      AMI: ami-0e23ec5ac0146a7a9
    eu-west-2:
      AMI: ami-03bb370cfd70f297d
    eu-west-3:
      AMI: ami-049de7754895b3950
    eu-north-1:
      AMI: ami-0c85acc9d84626d05
    ap-east-1:
      AMI: ami-0d03469bdf7d0a3d3
    me-south-1:
      AMI: ami-00301d028f3249639
@t-lo
Copy link
Member

t-lo commented Oct 27, 2020

Thank you for raising this issue. There seems to be an upstream ignition bug report with accompanying patch which we will investigate.

@t-lo
Copy link
Member

t-lo commented Oct 27, 2020

Additionally, IMDSv2 requires fetching an access token required to access the metadata, see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html .

@pipo02mix
Copy link

Hey @t-lo any updates here? We are waiting for this to switch to IMDSv2 🙏

margamanterola pushed a commit to flatcar/ignition that referenced this issue Jan 4, 2021
The current date is so old that it leads to errors in some cases.

See flatcar/Flatcar#220 and
coreos/ignition#989
@margamanterola
Copy link
Contributor

The metadata version seems to be mostly a red-herring. I've filed a PR for updating it so that we don't let it distract us from actual problems, but I think it will make zero difference for the issue reported here. The actual issue is the token that needs to be passed, as Thilo mentioned.

This was added upstream here: coreos/ignition#1154 (which probably requires the SDK update from: coreos/ignition#980)

@margamanterola margamanterola changed the title coreos-metadata.service or ignition.service does not communicate with AWS Instance Metadata when running in IMDSv2 mode Add AWS IMDSv2 mode support for ignition and afterburn Jan 5, 2021
@paurosello
Copy link

Hello, I have tested this with Flatcar 2765.2.6 and facing the issues described above. Is there a plan to update Ignition to support IMDSv2?

@jepio
Copy link
Member

jepio commented Jan 28, 2022

@tormath1 could we add a test for IMDSv1/IMDSv2 in kola while we're still missing support for this? This would be covered by migrating to ignition v3 (#387), and there is an interim PR here flatcar/ignition#32.

@jkroepke
Copy link

jkroepke commented Mar 26, 2022

Hi there.

I'm currently testing

Flatcar Container Linux by Kinvolk alpha 3185.0.0 for Amazon EC2

The alpha version include ignition 2 which supports IMDSv2 natively -> flatcar/ignition#32 (comment)

Mar 25 20:44:04 localhost ignition[978]: INFO     : PUT http://169.254.169.254/latest/api/token: attempt #1
Mar 25 20:44:04 localhost ignition[978]: INFO     : PUT result: OK
Mar 25 20:44:04 localhost ignition[978]: DEBUG    : parsed url from cmdline: ""
Mar 25 20:44:04 localhost ignition[978]: INFO     : no config URL provided
Mar 25 20:44:04 localhost ignition[978]: INFO     : reading system config file "/usr/lib/ignition/user.ign"
Mar 25 20:44:04 localhost ignition[978]: INFO     : no config at "/usr/lib/ignition/user.ign"
Mar 25 20:44:04 localhost ignition[978]: INFO     : PUT http://169.254.169.254/latest/api/token: attempt #1
Mar 25 20:44:04 localhost ignition[978]: INFO     : PUT result: OK
Mar 25 20:44:04 localhost ignition[978]: INFO     : GET http://169.254.169.254/2019-10-01/user-data: attempt #1
Mar 25 20:44:04 localhost ignition[978]: INFO     : GET result: Not Found
Mar 25 20:44:04 localhost ignition[978]: DEBUG    : parsing config with SHA512: cf83e1357eefb8bdf1542850d66d8007d620e4050b5715dc83f4a921d36ce9ce47d0d13c5d85f2b0ff8318d2877eec2f63b931bd47417a81a538327af927da3e

But the coreos-metadata service still has some issues, if token is required.

Mar 25 20:44:07 localhost coreos-metadata[1033]: Mar 25 20:44:07.810 INFO Fetching http://169.254.169.254/2009-04-04/meta-data/hostname: Attempt #3
Mar 25 20:44:07 localhost coreos-metadata[1033]: Mar 25 20:44:07.812 INFO Failed to fetch: 401 Unauthorized
Mar 25 20:44:11 localhost coreos-metadata[1033]: Mar 25 20:44:11.812 INFO Fetching http://169.254.169.254/2009-04-04/meta-data/hostname: Attempt #4
Mar 25 20:44:11 localhost coreos-metadata[1033]: Mar 25 20:44:11.814 INFO Failed to fetch: 401 Unauthorized
Mar 25 20:44:16 localhost coreos-metadata[1033]: Mar 25 20:44:16.815 INFO Fetching http://169.254.169.254/2009-04-04/meta-data/hostname: Attempt #5
Mar 25 20:44:16 localhost coreos-metadata[1033]: Mar 25 20:44:16.817 INFO Failed to fetch: 401 Unauthorized
Mar 25 20:44:21 localhost coreos-metadata[1033]: Mar 25 20:44:21.817 INFO Fetching http://169.254.169.254/2009-04-04/meta-data/hostname: Attempt #6
Mar 25 20:44:21 localhost coreos-metadata[1033]: Mar 25 20:44:21.819 INFO Failed to fetch: 401 Unauthorized
Mar 25 20:44:26 localhost coreos-metadata[1033]: Mar 25 20:44:26.819 INFO Fetching http://169.254.169.254/2009-04-04/meta-data/hostname: Attempt #7
Mar 25 20:44:26 localhost coreos-metadata[1033]: Mar 25 20:44:26.821 INFO Failed to fetch: 401 Unauthorized
Mar 25 20:44:31 localhost coreos-metadata[1033]: Mar 25 20:44:31.821 INFO Fetching http://169.254.169.254/2009-04-04/meta-data/hostname: Attempt #8
Mar 25 20:44:31 localhost coreos-metadata[1033]: Mar 25 20:44:31.823 INFO Failed to fetch: 401 Unauthorized
Mar 25 20:44:36 localhost coreos-metadata[1033]: Mar 25 20:44:36.823 INFO Fetching http://169.254.169.254/2009-04-04/meta-data/hostname: Attempt #9
Mar 25 20:44:36 localhost coreos-metadata[1033]: Mar 25 20:44:36.825 INFO Failed to fetch: 401 Unauthorized
Mar 25 20:44:41 localhost coreos-metadata[1033]: Mar 25 20:44:41.826 INFO Fetching http://169.254.169.254/2009-04-04/meta-data/hostname: Attempt #10
Mar 25 20:44:41 localhost coreos-metadata[1033]: Mar 25 20:44:41.828 INFO Failed to fetch: 401 Unauthorized
Mar 25 20:44:41 localhost coreos-metadata[1033]: Error: writing hostname
Mar 25 20:44:41 localhost coreos-metadata[1033]: Caused by: timed out
Mar 25 20:44:41 localhost coreos-metadata[1033]: Caused by: failed to fetch: 401 Unauthorized

This confuse me, since afterburn supports IMDSv2 - coreos/afterburn#305

Luckly, machines will not longer stay in emergency mode, but I'm unable to login into SSH, since coreos-metadata is responsable for grab the metadata instance key and put it in authorized_keys.

Edit:

Turns out that the Afterburn version is to old.

ip-10-110-0-251 ~ # /usr/bin/coreos-metadata -V
Afterburn 4.0.1-alpha.0

Looking at coreos/afterburn@0ed679e, v4.4.1 is the first version that support IMDSv2.

@jepio Any plans to bump Afterburn like ignition on alpha, too?

@jepio
Copy link
Member

jepio commented Mar 26, 2022

Yes definitely, that is planned. I would expect it to happen before ignition v3 hits stable.

@tormath1
Copy link
Contributor

@tormath1 could we add a test for IMDSv1/IMDSv2 in kola while we're still missing support for this? This would be covered by migrating to ignition v3 (#387), and there is an interim PR here flatcar-linux/ignition#32.

@jepio I had a look to AWS API: by default, metadata request supports both version (1 and 2) and new afterburn defaults to v2 and fallback to v1 in case of failure.
So I think we can keep our tests in the current way.

@pothos
Copy link
Member

pothos commented Apr 28, 2022

We can close this now as the afterburn update is done flatcar-archive/coreos-overlay#1769

@pothos pothos closed this as completed Apr 28, 2022
@QuentinBisson
Copy link

@pothos Do you have a timeline on when this is going to stable ?

@pothos
Copy link
Member

pothos commented May 12, 2022

The Alpha release 3227.0.0 has all the changes and will need some time until it gets into Beta and then Stable. There was a new major Stable release just now, so it's probably the next Stable in a month or so.

@QuentinBisson
Copy link

Thanks :)

@hakman
Copy link

hakman commented Jun 25, 2022

@pothos I tried the 3227.1.1 beta image with kOps. Unfortunately the boot sequence doesn't finish successfully:

Jun 25 04:47:12.309062 localhost bash[1715]: + /usr/bin/coreos-cloudinit --oem=ec2-compat
Jun 25 04:47:12.537609 i-0521929c4b27fd9ee bash[1715]: 2022/06/25 04:47:12 Checking availability of "cloud-drive"
Jun 25 04:47:12.537609 i-0521929c4b27fd9ee bash[1715]: 2022/06/25 04:47:12 Checking availability of "ec2-metadata-service"
...
Jun 25 04:52:03.772421 i-0521929c4b27fd9ee bash[1715]: 2022/06/25 04:52:03 Checking availability of "cloud-drive"
Jun 25 04:52:03.776615 i-0521929c4b27fd9ee bash[1715]: 2022/06/25 04:52:03 Checking availability of "ec2-metadata-service"
Jun 25 04:52:12.532478 i-0521929c4b27fd9ee bash[1715]: 2022/06/25 04:52:12 No datasources available in time

Full journal log can be found here.

Maybe this issue should be reopened.

@pothos
Copy link
Member

pothos commented Jun 26, 2022

Right, coreos-cloudinit support wasn't done and while not in the title it was meant to be part of this GitHub issue. Since we closed this I think it's worth opening a new issue for coreos-cloudinit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests