Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kOps 1.30.2 breaks cilium hubble #16965

Open
kforsthoevel opened this issue Nov 29, 2024 · 0 comments
Open

kOps 1.30.2 breaks cilium hubble #16965

kforsthoevel opened this issue Nov 29, 2024 · 0 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@kforsthoevel
Copy link

/kind bug

1. What kops version are you running? The command kops version, will display
this information.

Client version: 1.30.2 (git-v1.30.2)

2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.

Client Version: v1.30.7
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.7

3. What cloud provider are you using?

AWS

4. What commands did you run? What is the simplest way to reproduce this issue?

Upgraded from kOps 1.30.1 to 1.30.2 or simply create a new kOps cluster with kOps 1.30.2.

5. What happened after the commands executed?

Cilium hubble broke.

6. What did you expect to happen?

Cilium hubble show be working.

7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.

apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: null
  generation: 1
  name: [REDACTED]
spec:
  api:
    loadBalancer:
      class: Network
      type: Internal
  authorization:
    rbac: {}
  certManager:
    defaultIssuer: letsencrypt-live
    enabled: true
    hostedZoneIDs:
      - [REDACTED]
    managed: true
  channel: stable
  cloudLabels:
    Environment: [REDACTED]
    Owner: [REDACTED]
  cloudProvider: aws
  clusterAutoscaler:
    balanceSimilarNodeGroups: true
    cpuRequest: 100m
    enabled: true
    memoryRequest: 800Mi
    scaleDownUtilizationThreshold: "0.8"
    skipNodesWithLocalStorage: false
  configBase: s3://[REDACTED]
  containerRuntime: containerd
  etcdClusters:
    - cpuRequest: 200m
      etcdMembers:
        - encryptedVolume: true
          instanceGroup: master-eu-west-1a
          name: a
        - encryptedVolume: true
          instanceGroup: master-eu-west-1b
          name: b
        - encryptedVolume: true
          instanceGroup: master-eu-west-1c
          name: c
      memoryRequest: 100Mi
      name: main
    - cpuRequest: 100m
      etcdMembers:
        - encryptedVolume: true
          instanceGroup: master-eu-west-1a
          name: a
        - encryptedVolume: true
          instanceGroup: master-eu-west-1b
          name: b
        - encryptedVolume: true
          instanceGroup: master-eu-west-1c
          name: c
      memoryRequest: 100Mi
      name: events
  externalPolicies:
    node:
      - arn:aws:iam::[REDACTED]
  iam:
    allowContainerRegistry: true
    legacy: false
    serviceAccountExternalPermissions:
      - aws:
          policyARNs:
            - arn:aws:iam::[REDACTED]
        name: [REDACTED]
        namespace: default
    useServiceAccountExternalPermissions: true
  kubeDNS:
    nodeLocalDNS:
      enabled: false
    provider: CoreDNS
  kubeProxy:
    metricsBindAddress: 0.0.0.0
  kubelet:
    anonymousAuth: false
    authenticationTokenWebhook: true
    authorizationMode: Webhook
    cpuCFSQuota: false
  kubernetesApiAccess:
    - [REDACTED]
  kubernetesVersion: 1.30.7
  masterPublicName: api.[REDACTED]
  networkCIDR: [REDACTED]
  networkID: [REDACTED]
  networking:
    cilium:
      hubble:
        enabled: true
  nodeTerminationHandler:
    enableSQSTerminationDraining: true
    enabled: true
    managedASGTag: kubernetes.io/cluster/[REDACTED]
  nonMasqueradeCIDR: 100.64.0.0/10
  podIdentityWebhook:
    enabled: true
  serviceAccountIssuerDiscovery:
    discoveryStore: s3://[REDACTED]
    enableAWSOIDCProvider: true
  sshAccess:
    - [REDACTED]
  subnets:
    - cidr: [REDACTED]
      egress: [REDACTED]
      id: [REDACTED]
      name: eu-west-1a
      type: Private
      zone: eu-west-1a
    - cidr: [REDACTED]
      egress: [REDACTED]
      id: [REDACTED]
      name: eu-west-1b
      type: Private
      zone: eu-west-1b
    - cidr: [REDACTED]
      egress: [REDACTED]
      id: [REDACTED]
      name: eu-west-1c
      type: Private
      zone: eu-west-1c
    - cidr: [REDACTED]
      id: [REDACTED]
      name: utility-eu-west-1a
      type: Utility
      zone: eu-west-1a
    - cidr: [REDACTED]
      id: [REDACTED]
      name: utility-eu-west-1b
      type: Utility
      zone: [REDACTED]
    - cidr: [REDACTED]
      id: [REDACTED]
      name: utility-eu-west-1c
      type: Utility
      zone: eu-west-1c
  topology:
    dns:
      type: Public
  updatePolicy: external

---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2024-11-27T12:26:41Z"
  labels:
    kops.k8s.io/cluster: [REDACTED]
  name: master-eu-west-1a
spec:
  additionalSecurityGroups:
    - [REDACTED]
  cloudLabels:
    Environment: [REDACTED]
    Owner: [REDACTED]
  image: 075585003325/Flatcar-stable-4081.2.0-arm64-hvm
  machineType: m6g.2xlarge
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-eu-west-1a
    type: control-plane
  role: Master
  subnets:
    - eu-west-1a

---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2024-11-27T12:26:41Z"
  labels:
    kops.k8s.io/cluster: [REDACTED]
  name: master-eu-west-1b
spec:
  additionalSecurityGroups:
    - [REDACTED]
  cloudLabels:
    Environment: [REDACTED]
    Owner: [REDACTED]
  image: 075585003325/Flatcar-stable-4081.2.0-arm64-hvm
  machineType: m6g.2xlarge
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-eu-west-1b
    type: control-plane
  role: Master
  subnets:
    - eu-west-1b

---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2024-11-27T12:26:41Z"
  labels:
    kops.k8s.io/cluster: [REDACTED]
  name: master-eu-west-1c
spec:
  additionalSecurityGroups:
    - [REDACTED]
  cloudLabels:
    Environment: [REDACTED]
    Owner: [REDACTED]
  image: 075585003325/Flatcar-stable-4081.2.0-arm64-hvm
  machineType: m6g.2xlarge
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-eu-west-1c
    type: control-plane
  role: Master
  subnets:
    - eu-west-1c

---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2024-11-27T12:26:41Z"
  labels:
    kops.k8s.io/cluster: [REDACTED]
  name: nodes-eu-west-1a
spec:
  additionalSecurityGroups:
    - [REDACTED]
  cloudLabels:
    Environment: [REDACTED]
    Owner: [REDACTED]
    k8s.io/cluster-autoscaler/enabled: ""
    k8s.io/cluster-autoscaler/staging: ""
  image: 075585003325/Flatcar-stable-4081.2.0-arm64-hvm
  machineType: m6g.xlarge
  maxSize: 18
  minSize: 2
  nodeLabels:
    kops.k8s.io/instancegroup: nodes-eu-west-1a
    type: node
  role: Node
  subnets:
    - eu-west-1a

---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2024-11-27T12:26:42Z"
  labels:
    kops.k8s.io/cluster: [REDACTED]
  name: nodes-eu-west-1b
spec:
  additionalSecurityGroups:
    - [REDACTED]
  cloudLabels:
    Environment: [REDACTED]
    Owner: [REDACTED]
    k8s.io/cluster-autoscaler/enabled: ""
    k8s.io/cluster-autoscaler/staging: ""
  image: 075585003325/Flatcar-stable-4081.2.0-arm64-hvm
  machineType: m6g.xlarge
  maxSize: 18
  minSize: 2
  nodeLabels:
    kops.k8s.io/instancegroup: nodes-eu-west-1b
    type: node
  role: Node
  subnets:
    - eu-west-1b

---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2024-11-27T12:26:42Z"
  labels:
    kops.k8s.io/cluster: [REDACTED]
  name: nodes-eu-west-1c
spec:
  additionalSecurityGroups:
    - [REDACTED]
  cloudLabels:
    Environment: [REDACTED]
    Owner: [REDACTED]
    k8s.io/cluster-autoscaler/enabled: ""
    k8s.io/cluster-autoscaler/staging: ""
  image: 075585003325/Flatcar-stable-4081.2.0-arm64-hvm
  machineType: m6g.xlarge
  maxSize: 18
  minSize: 2
  nodeLabels:
    kops.k8s.io/instancegroup: nodes-eu-west-1c
    type: node
  role: Node
  subnets:
    - eu-west-1c

8. Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.

hubble-relay time="2024-11-29T11:14:28Z" level=warning msg="Failed to create peer client for peers synchronization; will try again after the timeout has expired" error="context deadline exceeded" subsys=hubble-relay target="hubble-peer.kube-system.svc.cluster.local:443"
cilium status
    /¯¯\
 /¯¯\__/¯¯\    Cilium:             10 warnings
 \__/¯¯\__/    Operator:           OK
 /¯¯\__/¯¯\    Envoy DaemonSet:    disabled (using embedded mode)
 \__/¯¯\__/    Hubble Relay:       OK
    \__/       ClusterMesh:        disabled

DaemonSet              cilium             Desired: 10, Ready: 10/10, Available: 10/10
Deployment             cilium-operator    Desired: 2, Ready: 2/2, Available: 2/2
Deployment             hubble-relay       Desired: 2, Ready: 2/2, Available: 2/2
Deployment             hubble-ui          Desired: 1, Ready: 1/1, Available: 1/1
Containers:            cilium             Running: 10
                       cilium-operator    Running: 2
                       hubble-relay       Running: 2
                       hubble-ui          Running: 1
Cluster Pods:          83/83 managed by Cilium
Helm chart version:
Image versions         cilium             quay.io/cilium/cilium:v1.15.6@sha256:6aa840986a3a9722cd967ef63248d675a87add7e1704740902d5d3162f0c0def: 10
                       cilium-operator    quay.io/cilium/operator:v1.15.6@sha256:f3ebc5eac9c0b37aabdf120e120a704ccd77d8c34191adec120e9ee021b8a875: 2
                       hubble-relay       quay.io/cilium/hubble-relay:v1.15.6@sha256:a0863dd70d081b273b87b9b7ce7e2d3f99171c2f5e202cd57bc6691e51283e0c: 2
                       hubble-ui          quay.io/cilium/hubble-ui-backend:v0.13.0@sha256:1e7657d997c5a48253bb8dc91ecee75b63018d16ff5e5797e5af367336bc8803: 1
                       hubble-ui          quay.io/cilium/hubble-ui:v0.13.0@sha256:7d663dc16538dd6e29061abd1047013a645e6e69c115e008bee9ea9fef9a6666: 1
Warnings:              cilium             cilium-5ttd6    Hubble: Server not initialized
                       cilium             cilium-5vqz6    Hubble: Server not initialized
                       cilium             cilium-6pgkc    Hubble: Server not initialized
                       cilium             cilium-jq96t    Hubble: Server not initialized
                       cilium             cilium-jrrcb    Hubble: Server not initialized
                       cilium             cilium-lnp7s    Hubble: Server not initialized
                       cilium             cilium-lzvb4    Hubble: Server not initialized
                       cilium             cilium-ps7mq    Hubble: Server not initialized
                       cilium             cilium-q6tt9    Hubble: Server not initialized
                       cilium             cilium-xq8h7    Hubble: Server not initialized

9. Anything else do we need to know?

After downgrading to kOps 1.30.1 cilium hubble works again.

cilium status
    /¯¯\
 /¯¯\__/¯¯\    Cilium:             OK
 \__/¯¯\__/    Operator:           OK
 /¯¯\__/¯¯\    Envoy DaemonSet:    disabled (using embedded mode)
 \__/¯¯\__/    Hubble Relay:       OK
    \__/       ClusterMesh:        disabled

DaemonSet              cilium             Desired: 9, Ready: 9/9, Available: 9/9
Deployment             cilium-operator    Desired: 2, Ready: 2/2, Available: 2/2
Deployment             hubble-relay       Desired: 2, Ready: 2/2, Available: 2/2
Deployment             hubble-ui          Desired: 1, Ready: 1/1, Available: 1/1
Containers:            cilium             Running: 9
                       cilium-operator    Running: 2
                       hubble-relay       Running: 2
                       hubble-ui          Running: 1
Cluster Pods:          82/82 managed by Cilium
Helm chart version:
Image versions         cilium             quay.io/cilium/cilium:v1.15.6@sha256:6aa840986a3a9722cd967ef63248d675a87add7e1704740902d5d3162f0c0def: 9
                       cilium-operator    quay.io/cilium/operator:v1.15.6@sha256:f3ebc5eac9c0b37aabdf120e120a704ccd77d8c34191adec120e9ee021b8a875: 2
                       hubble-relay       quay.io/cilium/hubble-relay:v1.15.6@sha256:a0863dd70d081b273b87b9b7ce7e2d3f99171c2f5e202cd57bc6691e51283e0c: 2
                       hubble-ui          quay.io/cilium/hubble-ui-backend:v0.13.0@sha256:1e7657d997c5a48253bb8dc91ecee75b63018d16ff5e5797e5af367336bc8803: 1
                       hubble-ui          quay.io/cilium/hubble-ui:v0.13.0@sha256:7d663dc16538dd6e29061abd1047013a645e6e69c115e008bee9ea9fef9a6666: 1
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Nov 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

2 participants