Skip to content
This repository has been archived by the owner on Nov 5, 2021. It is now read-only.

[Bug] Discovery of k8s service when using AWS ELB #418

Closed
networkop opened this issue Jun 23, 2020 · 7 comments
Closed

[Bug] Discovery of k8s service when using AWS ELB #418

networkop opened this issue Jun 23, 2020 · 7 comments
Assignees
Milestone

Comments

@networkop
Copy link
Contributor

What happened: When using k8s service discovery in clusters running on AWS, discovery of public IP of LoadBalancer services fails

What you expected to happen: I am able to use AWS load-balancers as targets by all standard probes

How to reproduce it (as minimally and precisely as possible): create a k8s cluster with AWS as a cloud-provider. Create a LoadBalancer service and an RDS deployment that can discover it. Connect cloudprober to the RDS and attempt to probe it. The following is observed in the logs

I0622 18:43:12.014801 3299756 prober.go:309] Starting probe: pod-to-endpoints
W0622 18:43:14.015205 3299756 ping.go:214] Bad target: cloudprober. Err: no IP address for the resource: cloudprober
I0622 18:43:14.015270 3299756 ping.go:268] Skipping unresolved target: cloudprober
I0622 18:43:14.040403 3299756 ping.go:268] Skipping unresolved target: cloudprober

Anything else we need to know?: this is caused by the fact that AWS ELB returns a hostname (cname/alias) instead of IP, which is what gets populated in a service's status field. Currently, RDS code assumes that svc.Status.LoadBalancer.Ingress can only contain an IP, whereas hostname is also a valid option and is what gets populated by the AWS cloud-provider code.

@networkop
Copy link
Contributor Author

I've implemented a naive fix like this:

@@ -93,7 +93,12 @@ func (lister *servicesLister) listResources(req *pb.ListResourcesRequest) ([]*pb
                        if len(svc.Status.LoadBalancer.Ingress) == 0 {
                                continue
                        }
-                       res.Ip = proto.String(svc.Status.LoadBalancer.Ingress[0].IP)
+                       if svc.Status.LoadBalancer.Ingress[0].IP != "" {
+                               res.Ip = proto.String(svc.Status.LoadBalancer.Ingress[0].IP)
+                       } else if svc.Status.LoadBalancer.Ingress[0].Hostname != "" {
+                               res.Name = proto.String(svc.Status.LoadBalancer.Ingress[0].Hostname)
+                       }
+
                } else {
                        res.Ip = proto.String(svc.Spec.ClusterIP)
                }
@@ -117,7 +122,8 @@ type serviceInfo struct {
        Status struct {
                LoadBalancer struct {
                        Ingress []struct {
-                               IP string
+                               IP       string
+                               Hostname string
                        }
                }
        }

However this feels like the wrong approach as I'm overwriting/hiding the real name of the resource. We also can't use hostname since RDS targets do not have an option to use local/client resolver and will try to resolve hostnames in local cache (if I understood your code correctly).

So it seems like the way forward is to do a resolution on the RDS server? (bearing in mind that AWS's ELB returns 2 public IPs).

@networkop networkop changed the title Discovery of k8s service when using AWS ELB [BUG] Discovery of k8s service when using AWS ELB Jun 23, 2020
@networkop networkop changed the title [BUG] Discovery of k8s service when using AWS ELB [Bug] Discovery of k8s service when using AWS ELB Jun 23, 2020
@manugarg
Copy link
Contributor

@networkop Thanks a lot for reporting this bug.

You're right about this not being straightforward. There are a few possible approaches:

  1. Handle this as a special case and return hostname as resource name.
  2. Add another field "hostname" to RDS resource, and on the client use local resolver if there is a hostname in the resource but no IP.
  3. Since RDS resource's IP field is a string, return hostname in that field itself and handle the distinction on the client side. This doesn't seem that crazy after seeing the AWS's documentation: https://aws.amazon.com/premiumsupport/knowledge-center/eks-kubernetes-services-cluster/; towards the end you'll see that kubectl shows hostname as external IP.
  4. Resolve hostname on the RDS server side itself.

(2), and, to a certain extent (3), seem like the right approach. (2) will require more careful thought as it's changing the protobuf message.
(1) and (3) are more like workarounds for now.

Let me give it a bit more thought and get back on this.

bearing in mind that AWS's ELB returns 2 public IPs.

Are these IPv4 and IPv6?

@manugarg
Copy link
Contributor

  1. Since RDS resource's IP field is a string, return hostname in that field itself and handle the distinction on the client side. This doesn't seem that crazy after seeing the AWS's documentation: https://aws.amazon.com/premiumsupport/knowledge-center/eks-kubernetes-services-cluster/; towards the end you'll see that kubectl shows hostname as external IP.

FYI, I am working on the client side changes to make this option work. So far this seems like a reasonable thing to do.

@networkop
Copy link
Contributor Author

great, thanks. there's just one concern I've got with this approach. Every load-balancer service will also have a unique spec.clusterIP in addition to what's reported in the status field. If I understand it correctly, this IP will always be hidden by the status.loadbalancer.ingress, even now (before this issue gets fixed).
However, if we consider option #2 and extend the RDS resource, this would make it a bit cleaner. what do you think? For example:

message RDSTargets {
  repeated k8s.loadbalancer.ingress ingresses = 5;
}

Of course, this implies changing the schema and much more code compared to option 3 🤷‍♂️

@manugarg
Copy link
Contributor

spec.ClusterIP is in fact the default IP you'll get. To get the ingress IP, you've to specify the ip_type as PUBLIC:

if reqIPType == pb.IPConfig_PUBLIC {

rds_targets {
  ...
  ip_config {
    ip_type: PUBLIC
  }
}

Did I get your question correctly.

@networkop
Copy link
Contributor Author

ah yes, you're right, my apologies.

@manugarg manugarg added this to the v0.10.8 milestone Jun 30, 2020
@manugarg manugarg self-assigned this Jul 4, 2020
manugarg added a commit that referenced this issue Jul 6, 2020
… using system's DNS resolver.

This is to support cases where resources have a hostname but not an IP, for example, AWS ELB:
#418

PiperOrigin-RevId: 319832714
manugarg added a commit that referenced this issue Jul 6, 2020
… using system's DNS resolver.

This is to support cases where resources have a hostname but not an IP, for example, AWS ELB:
#418

PiperOrigin-RevId: 319832714
manugarg added a commit that referenced this issue Jul 8, 2020
…ame is not,

RDS client has already been changed to DNS-resolve resource's IP if it's not a valid IP address.

This is to support Kubernetes cases where ingress is given a hostname instead of an IP address, for example, AWS ELB. See #418 for more details.

PiperOrigin-RevId: 320145826
manugarg added a commit that referenced this issue Jul 8, 2020
…ame is not. (#431)

RDS client has already been changed to DNS-resolve resource's IP if it's not a valid IP address.

This is to support Kubernetes cases where ingress is given a hostname instead of an IP address, for example, AWS ELB. See #418 for more details.

PiperOrigin-RevId: 320145826
@manugarg
Copy link
Contributor

manugarg commented Jul 8, 2020

Closing this. It will get released with v0.10.8:
https://github.com/google/cloudprober/milestone/6

@manugarg manugarg closed this as completed Jul 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants