Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creds fail to load (from env) when Java11 snap-start enabled #225

Closed
stevebuik opened this issue Dec 14, 2022 · 15 comments
Closed

Creds fail to load (from env) when Java11 snap-start enabled #225

stevebuik opened this issue Dec 14, 2022 · 15 comments

Comments

@stevebuik
Copy link

Thank you for your interest in helping to improve Cognitect's aws-api!

Dependencies

Be sure to list the precise libs and versions you are using ("the
latest" might change by the time we're looking at your issue).

e.g.

{:deps {com.cognitect.aws/api                      {:mvn/version "0.8.630"}
           com.cognitect.aws/endpoints                {:mvn/version "1.1.12.358"}
           com.cognitect.aws/lambda                   {:mvn/version "825.2.1263.0"}
           com.cognitect.aws/apigatewaymanagementapi  {:mvn/version "821.2.1107.0"}
.. others elided
}}

Description with failing test case

(defn send-message!
  [{:keys [domain-name stage message connection-id]}]
  (println (sort (keys (System/getenv))))
  (let [client (-> {:api                  :apigatewaymanagementapi
                    :endpoint-override    {:hostname domain-name
                                           :path     (format "/%s/@connections/" stage)}
                    :credentials-provider (reify CredentialsProvider
                                            (fetch [_]
                                              {:aws/access-key-id     (u/getenv "AWS_ACCESS_KEY_ID")
                                               :aws/secret-access-key (u/getenv "AWS_SECRET_ACCESS_KEY")
                                               :aws/session-token     (u/getenv "AWS_CONTAINER_AUTHORIZATION_TOKEN")}))}
                   (aws/client))]
    ;(aws/validate-requests client)
    (->> {:op      :PostToConnection
          :request {:ConnectionId connection-id
                    :Data         (.getBytes message)}}
         (aws/invoke client))))

Stack traces

{:message "The security token included in the request is invalid.", :cognitect.anomalies/category :cognitect.anomalies/forbidden}

@stevebuik
Copy link
Author

stevebuik commented Dec 14, 2022

works fine when snapstart is disabled (albeit with slow cold start) and AWS_SESSION_TOKEN used instead

@stevebuik
Copy link
Author

stevebuik commented Dec 15, 2022

further tests using container creds led to this variation of the fn

(defn send-message!
  [{:keys [domain-name stage message connection-id]}]

  (println
    "creds response"
    (u/getenv ec2/container-credentials-full-uri-env-var)
    (pr-str
      (a/<!!
        (http/submit
          (cognitect.http-client/create {})
          (#'mu/request-map (URI. (u/getenv ec2/container-credentials-full-uri-env-var)))))))

  (let [client (-> {:api                  :apigatewaymanagementapi
                    :endpoint-override    {:hostname domain-name
                                           :path     (format "/%s/@connections/" stage)}
                    :credentials-provider (credentials/container-credentials-provider (shared/http-client))}
                   (aws/client))]
    ;(aws/validate-requests client)
    (->> {:op      :PostToConnection
          :request {:ConnectionId connection-id
                    :Data         (.getBytes message)}}
         (aws/invoke client))))

which generates this log response...

creds response http://127.0.0.1:9001/2021-04-23/credentials {:status 404, :headers {"x-content-type-options" "nosniff", "content-length" "46", "date" "Thu, 15 Dec 2022 03:51:06 GMT", "content-type" "text/plain; charset=UTF-8"}, :body #object[java.nio.HeapByteBuffer 0x3d526ad9 "java.nio.HeapByteBuffer[pos=0 lim=46 cap=46]"]}

which I presume means that the ec2 credentials request process does not work in the lambda snapstart environment

@stevebuik
Copy link
Author

this works!!

(defn snapstart-credentials-provider
  "return a creds provider that uses the 2 Snapstart environment variables to get current keys/tokens for API requests"
  [http-client]
  (reify CredentialsProvider
    (fetch [_]
      (let [request (-> (u/getenv mu/container-credentials-full-uri-env-var)
                        (URI.)
                        (#'mu/request-map)                  ; TODO has https bug but doesn't affect this provider which uses http/80
                        (assoc-in [:headers "Authorization"] (u/getenv "AWS_CONTAINER_AUTHORIZATION_TOKEN")))
            response (->> request
                          (http/submit http-client)
                          (a/<!!))]
        (when (= 200 (:status response))
          (let [creds (some-> (:body response) (u/bbuf->str) (json/read-str :key-fn keyword))]
            (credentials/valid-credentials
              {:aws/access-key-id     (:AccessKeyId creds)
               :aws/secret-access-key (:SecretAccessKey creds)
               :aws/session-token     (:Token creds)
               ::credentials/ttl      (credentials/calculate-ttl creds)}
              "snapstart container")))))))

(defn send-message!
  [{:keys [domain-name stage message connection-id]}]
  (let [client (-> {:api                  :apigatewaymanagementapi
                    :endpoint-override    {:hostname domain-name
                                           :path     (format "/%s/@connections/" stage)}
                    :credentials-provider (snapstart-credentials-provider (http/create {}))}
                   (aws/client))]
    ;(aws/validate-requests client)
    (->> {:op      :PostToConnection
          :request {:ConnectionId connection-id
                    :Data         (.getBytes message)}}
         (aws/invoke client))))

@stevebuik
Copy link
Author

stevebuik commented Dec 15, 2022

with that impl, the cold start first request is a bit slow...
REPORT RequestId: 53dd2192-1c24-456d-b226-f3e675810ced Duration: 7296.40 ms Billed Duration: 7458 ms Memory Size: 1024 MB Max Memory Used: 174 MB Restore Duration: 372.46 ms

but subsequent requests are as quick as you would expect...
REPORT RequestId: b6cabdbe-d556-4b89-9313-6ebfc609814c Duration: 187.31 ms Billed Duration: 188 ms Memory Size: 1024 MB Max Memory Used: 175 MB

not sure why the slow first request but it does work. I'll move forward with this until an official release (that might improve cold start perf as well?)

@stevebuik
Copy link
Author

ok, that slow first request has nothing to do with creds.

it's due to the :PostToConnection call which takes 5-7 secs on first call and 200ms in subsequent calls. I'd guess this is some kind of keep-alive in the client (can we control this?) or aws network init delay (which we can't control)

Either way, this looks like a good creds solution. just needs the official port of this impl. let me know if you need any other info

@dchelimsky
Copy link
Contributor

Thanks for digging this up. I think we can fix this in the ec2-metadata-utils (add the header there if the env var is present). That's what the java sdk v2 is doing. I'm surprised this hasn't come up until now!

@dchelimsky
Copy link
Contributor

I pushed a speculative fix to the dc/container-credentials-auth-header branch. It might be a few days before I'll have time to test this properly, so please feel free to build and test it locally.

@stevebuik
Copy link
Author

Thanks. I'll try it today. I have a clarification question. With my solution, I had trouble making it work with (shared/http-client) so that's why I used (http/create {})

there are 2 http client protocols available in this project:
cognitect.http-client/IClient (submittable)
cognitect.aws.http/HttpClient (stoppable)

The container creds fn needs the submittable type. When it's part of the chain it appears to be passed the stoppable. Is this a bug or am I thinking about this wrong?

I ask because I've experienced the client memory leak in Ions in the past so I want to make sure I use this correctly (even though Lambdas are less prone to leaks i.e. cattle not pets)

@dchelimsky
Copy link
Contributor

dchelimsky commented Dec 15, 2022

cognitect.http-client/IClient is part of a dependency, not part of aws-api.

cognitect.aws.http/HttpClient has -submit and -stop, so it's both submittable and stoppable. Structurally, the cognitect http-client is always wrapped in a reification of cognitect.aws.http/HttpClient, created by calling cognitect.aws.http.cognitect/create.

So either gives you instances of the same thing, though (shared/http-client) will always return the same object, so it's possible that that's the source of any trouble you saw since the aws-client's http-client is waiting for the credential provider's http-client to return before it can complete its work. That shouldn't be an issue, but bugs happen, so let me know if you still see the same problem.

@stevebuik
Copy link
Author

thanks for clarifying. I just tested your fix. don't fully understand the error but using

:credentials-provider (ec2/container-credentials (shared/http-client))

produces...

:message "No implementation of method: :fetch of protocol: #'cognitect.aws.credentials/CredentialsProvider found for class: clojure.lang.PersistentArrayMap"

when I switch back to my custom provider using the same commit, it works fine.

Since my feedback loop via CI is slow, I'm gonna pause testing your code because I need to focus on my deliverable. So I'll wait till you have time to setup a test harness - that'll be more effiecient for both of us.

Happy to provide any other info in the meantime or when you are setting up.

No hurry as I have a working solution till then.

@dchelimsky
Copy link
Contributor

Actually, with my fix you shouldn't need to provide a provider at all. If you have time, give that a shot. If not, it can wait.

@stevebuik
Copy link
Author

That was going to be my last test :) I just ran it and it works! I'll stick with your commit now and will report any other issues that arise.

@stevebuik
Copy link
Author

stevebuik commented Dec 25, 2022

this is now working for multiple lambdas and multiple AWS apis. I would suggest this can be closed but....

I did uncover another bug that is related to this. It's that the TTL on the container creds doesn't appear to be refreshing after a snapstart resume. I'll attach my ADR doc to this comment. It contains details on the problem and my (verified) workaround.

Since I have a workaround, it's not urgent, but I figure you want to know if a TTL isn't working properly i.e. I suspect this is broken outside of the snapstart use-case as well.
adr-0002-cold-starts.md

@dchelimsky
Copy link
Contributor

dchelimsky commented Dec 28, 2022

Fix released in aws-api-0.8.635

@dchelimsky
Copy link
Contributor

Fix released in aws-api-0.8.641

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants