-
Notifications
You must be signed in to change notification settings - Fork 239
v3 API and refactorings #109
Conversation
* created additional GetPodCredentials operation * use Client interface in place of PodFinder and other interfaces * remove telemetry wrappers for gateway and server; will be replaced with prometheus soon
* added more tests and refactored to tidy * added coverage makefile target to track progress Acked-by: Paul Ingles <[email protected]> Signed-off-by: Paul Ingles <[email protected]>
* create unit test for credentials handler * test happy path: role and credentials are found
* added test for server error when requesting credentials * remove empty role- not possible for this handler to be called without a role in the vars
* add credentials handler test for retry * role handler install
* remove old metadata server tests, replaced with handler unit tests * add server test
Makefile
Outdated
test: $(SOURCES) | ||
go test test/unit/*_test.go | ||
go test test/functional/*_test.go | ||
test: $(shell find . -name '*.go') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$(SOURCES)
!
@DewaldV @Sambooo @Joseph-Irving @tombooth I added you all so you could take a look. @moofish32 I know you were also keen to see where this branch/change got to so be great to hear what you think also. |
credentialsProvider sts.CredentialsProvider | ||
clientIP clientIPFunc | ||
policy server.AssumeRolePolicy | ||
clientIP clientIPFunc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name clientIP
is a bit misleading in a few places. We're not passing around an ip address but rather an ip resolution function. Not massively important though.
client server.Client | ||
} | ||
|
||
func (c *credentialsHandler) Install(router *mux.Router) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this approach to splitting up the handlers 👍
} | ||
|
||
credentials, err := c.credentialsForRole(ctx, requestedRole) | ||
credentials, err := c.fetchCredentials(ctx, ip, requestedRole) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no more code that can be deleted from this handler without breaking it. Good stuff.
Codecov Report
@@ Coverage Diff @@
## master #109 +/- ##
=========================================
Coverage ? 37.79%
=========================================
Files ? 21
Lines ? 799
Branches ? 0
=========================================
Hits ? 302
Misses ? 469
Partials ? 28
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a fan of the new testing approach here @pingles. I haven't tried to figure out whether we can avoid supporting two api versions easily but otherwise I'm happy with these changes.
t.Error("expected json result", content) | ||
} | ||
|
||
expected := `{"Code":"","Type":"","AccessKeyId":"A1","SecretAccessKey":"S1","Token":"","Expiration":"","LastUpdated":""}` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably unmarshal this into a struct with these fields as encoding/json
implements RFC 7159 so there's no guarantee the keys will be marshaled in this order.
t.Error("unexpected status", rr.Code) | ||
} | ||
if !strings.Contains(rr.Body.String(), "error fetching credentials: no pod found") { | ||
t.Error("unexpected error", rr.Body.String()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unexpected lack of error, no?
) | ||
|
||
// StubClient returns fake server responses | ||
type StubClient struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 to this idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Major +1 for the testing improvements and refactoring of the auth being controlled by the server and not split with the client. I provided some comments as interested party that hopefully help represent at least one outside potential customer.
@@ -90,15 +66,14 @@ func (c *credentialsHandler) Handle(ctx context.Context, w http.ResponseWriter, | |||
return http.StatusOK, nil | |||
} | |||
|
|||
func (c *credentialsHandler) credentialsForRole(ctx context.Context, role string) (*sts.Credentials, error) { | |||
func (c *credentialsHandler) fetchCredentials(ctx context.Context, ip, requestedRole string) (*sts.Credentials, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a great improvement and reduces the ability of compromised client to just arbitrarily request roles.
) | ||
|
||
func TestReturnRoleWhenClientResponds(t *testing.T) { | ||
r, _ := http.NewRequest("GET", "/latest/meta-data/iam/security-credentials/", nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of these tests use /latest/meta-data/iam/security-credentials/
. However, when you curl on an ec2 instance the /latest/meta-data/iam/security-credentials
in at least some situations you don't get 200
:
curl -v 169.254.169.254/latest/meta-data/iam/security-credentials
* About to connect() to 169.254.169.254 port 80 (#0)
* Trying 169.254.169.254...
* Connected to 169.254.169.254 (169.254.169.254) port 80 (#0)
> GET /latest/meta-data/iam/security-credentials HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 169.254.169.254
> Accept: */*
>
* HTTP 1.0, assume close after body
< HTTP/1.0 301 Moved Permanently
< Location: http://169.254.169.254/latest/meta-data/iam/security-credentials/
< Content-Length: 0
< Connection: close
< Date: Mon, 09 Jul 2018 21:12:44 GMT
< Server: EC2ws
I don't think the SDK actually uses this endpoint, but should we make this match?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you're getting a 301 rather than 200 because you didn't including the trailing slash. See this Google blog post:
don’t both return a 200 response code, but that one version redirects to the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think @moofish32 was pointing out that the metadata server they tested on 301'ed, rather than returning credentials. Kiam currently supports either form and serves 200s on both.
@moofish32 I seem to remember others mentioning this stuff varied between instance generations (so it changed between m4 and m5 for example).
Do you remember the instance type you tested this on?
Either way I suspect we should probably add an issue to change it so that it behaves the same way- 301'ing rather than successfully working on both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've now added #116 to track this issue separately.
@@ -11,17 +11,16 @@ | |||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |||
// See the License for the specific language governing permissions and | |||
// limitations under the License. | |||
package kiam | |||
package future |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there something really unique about this future package? I know Capital One has this one: https://github.com/capitalone/go-future-context, I'm sure there are others. I just don't know if there is a specific reason the team wants to maintain this one. I agree it's a small amount of code right now.
@@ -107,7 +107,7 @@ func (f *forbidden) Explanation() string { | |||
} | |||
|
|||
func (p *RequestingAnnotatedRolePolicy) IsAllowedAssumeRole(ctx context.Context, role, podIP string) (Decision, error) { | |||
pod, err := p.pods.GetPodByIP(ctx, podIP) | |||
pod, err := p.pods.GetPodByIP(podIP) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a point where I'm curious if we should validate the client IP (kiam client making the request) and the HostIp in the PodStatus are a match? This would prevent being able to use a compromised host to provide IAM credentials for pods that could not be scheduled on the host. Probably a future enhancement but wanted to pose the question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's only the agent that directly receives requests from clients, after which point the ip address forwarded by the agent is used to identify the pod. Are you suggesting the agent also pass its host ip to the server in this api call?
I suppose it does make sense to check that as a pod shouldn't hit another host's agent. @pingles would supporting this be another major bump or just an extension of the current request struct with graceful fallback?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@moofish32 I like the suggestion as an additional security enhancement (I assume this means check that the pod's host matches the agent's client IP?)
If so- could you open a new issue to track it please? I'm not sure if we'd commit to adding for v3 but I don't think it'd change the API as-is (the Pod IP is passed to GetPodCredentials
which would allow for additional policy to check against the PodStatus
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return nil, ErrPolicyForbidden | ||
} | ||
|
||
creds, err := k.credentialsProvider.CredentialsForRole(ctx, req.Role) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So another potentially breaking change will come here when we want to specify the session name by pod ID. I don't know if that is important enough to roll in here or wait until later. I think the usefulness of a constant session name is fairly low.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a separate issue open for being able to name sessions #38.
The signature for CredentialsForRole
will change but the only published API is the gRPC server's GetPodCredentials
which should tolerate changes to reference pods by IP/ID etc.
So for now I'd leave this and pick it up as part of #38.
@@ -181,7 +224,7 @@ func NewServer(config *Config) (*KiamServer, error) { | |||
arnResolver, | |||
) | |||
server.credentialsProvider = credentialsCache | |||
server.manager = prefetch.NewManager(credentialsCache, server.pods, server.pods) | |||
server.manager = prefetch.NewManager(credentialsCache, server.pods) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more potential big change to consider is moving the credential cache into the pod cache. Logically if you want unique credentials per pod, then maintaining two caches is less useful. If this is potentially a major version change to implement and it's high enough priority, should we consider grouping the change here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change was focused on changes to move to a new Agent/Server interaction ahead of v3. As part of doing this I also refactored/changed a few other interfaces to streamline stuff a little.
API Changes
This adds a new Server API call:
GetPodCredentials
that allows credentials to be obtained by the server given an IP address. Previously the agent would make three separate calls:This collapses calls together inside the server. This makes it easier to deliver #14, as well as potentially moving to other mechanisms for agents to request credentials (like the service account token introduced in #104.
Testing
I've also broken apart the metadata server tests and moved all tests into the package directories, the Drone build also now runs with coverage stats to make it easier to figure out what parts of the system need tests added.
End to End Testing
Although the tests are in better shape there's still things that aren't covered and would be nice to have some end-to-end tests to prove the whole- for now I've parked that and created #108 to track separately.
Review and Request for Comments
I'd like some review/thoughts on the protos in particular. For now I've added new API calls and left the existing, so that the same server could support both new (3.x) and old (2.x) agents. However, given Kubernetes makes it easy to run servers side-by-side (only the agents have a hard requirement on running one per node) another option is to remove the API calls and have agents talk to different server services.