Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark application does not start on new Helm chart versions #1999

Closed
1 task done
matthewrossi opened this issue Apr 22, 2024 · 2 comments
Closed
1 task done

Spark application does not start on new Helm chart versions #1999

matthewrossi opened this issue Apr 22, 2024 · 2 comments

Comments

@matthewrossi
Copy link
Contributor

Description

The creation of the Spark application resource does not trigger the submission of the Spark application (status is missing and so are the events).
The same setup is working with Helm chart v1.1.27.

  • ✋ I have searched the open/closed issues and my issue is not listed.

Reproduction Code [Required]

Steps to reproduce the behavior:

  1. Set up a new kubenetes cluster (I set up a local one with minikube start).
  2. helm repo add spark-operator https://kubeflow.github.io/spark-operator
  3. helm install my-release spark-operator/spark-operator --namespace spark-operator --create-namespace --version 1.2.7 --set sparkJobNamespace=default
  4. kubectl create -f spark-pi.yaml

spark-pi.yaml is equivalent to https://github.com/kubeflow/spark-operator/blob/master/examples/spark-pi.yaml with the exception of the image and service account (their default values gave issues also with Helm chart v1.1.27):

apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: spark-pi
  namespace: default
spec:
  type: Scala
  mode: cluster
  image: "apache/spark:3.5.1"
  imagePullPolicy: Always
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.12-3.5.1.jar"
  sparkVersion: "3.5.1"
  sparkUIOptions:
    serviceLabels:
      test-label/v1: 'true'
  restartPolicy:
    type: Never
  volumes:
    - name: "test-volume"
      hostPath:
        path: "/tmp"
        type: Directory
  driver:
    cores: 1
    coreLimit: "1200m"
    memory: "512m"
    labels:
      version: 3.5.1
    serviceAccount: my-release-spark
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"
  executor:
    cores: 1
    instances: 1
    memory: "512m"
    labels:
      version: 3.5.1
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"

Expected behavior

The example application completes successfully (as with the Helm chart v1.1.27)

Actual behavior

The Spark application holds with empty status and no events

Terminal Output

$ kubectl get sparkapp -A
NAMESPACE   NAME       STATUS   ATTEMPTS   START   FINISH   AGE
default     spark-pi                                        85s
$ kubectl describe sparkapp spark-pi
Name:         spark-pi
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  sparkoperator.k8s.io/v1beta2
Kind:         SparkApplication
Metadata:
  Creation Timestamp:  2024-04-22T11:00:06Z
  Generation:          1
  Resource Version:    6350
  UID:                 c1d9b5fa-46fa-4bf5-8cde-232e54854bc3
Spec:
  Driver:
    Core Limit:  1200m
    Cores:       1
    Labels:
      Version:        3.1.1
    Memory:           512m
    Service Account:  my-release-spark
    Volume Mounts:
      Mount Path:  /tmp
      Name:        test-volume
  Executor:
    Cores:      1
    Instances:  1
    Labels:
      Version:  3.1.1
    Memory:     512m
    Volume Mounts:
      Mount Path:         /tmp
      Name:               test-volume
  Image:                  apache/spark:3.5.1
  Image Pull Policy:      Always
  Main Application File:  local:///opt/spark/examples/jars/spark-examples_2.12-3.5.1.jar
  Main Class:             org.apache.spark.examples.SparkPi
  Mode:                   cluster
  Restart Policy:
    Type:  Never
  Spark UI Options:
    Service Labels:
      test-label/v1:  true
  Spark Version:      3.5.1
  Type:               Scala
  Volumes:
    Host Path:
      Path:  /tmp
      Type:  Directory
    Name:    test-volume
Events:      <none>
$ kubectl -n spark-operator logs my-release-spark-operator-57665d5bbd-25774
++ id -u
+ myuid=0
++ id -g
+ mygid=0
+ set +e
++ getent passwd 0
+ uidentry=root:x:0:0:root:/root:/bin/bash
+ set -e
+ echo 0
+ echo 0
+ echo root:x:0:0:root:/root:/bin/bash
0
0
root:x:0:0:root:/root:/bin/bash
+ [[ -z root:x:0:0:root:/root:/bin/bash ]]
+ exec /usr/bin/tini -s -- /usr/bin/spark-operator -v=2 -logtostderr -namespace=spark-operator -enable-ui-service=true -ingress-url-format= -controller-threads=10 -resync-interval=30 -enable-batch-scheduler=false -label-selector-filter= -enable-metrics=true -metrics-labels=app_type -metrics-port=10254 -metrics-endpoint=/metrics -metrics-prefix= -enable-resource-quota-enforcement=false
I0422 10:59:28.245912      10 main.go:152] Starting the Spark Operator
I0422 10:59:28.250664      10 main.go:189] Enabling metrics collecting and exporting to Prometheus
I0422 10:59:28.250712      10 metrics.go:142] Started Metrics server at localhost:10254/metrics
I0422 10:59:28.251048      10 main.go:235] Starting application controller goroutines
I0422 10:59:28.351921      10 controller.go:169] Starting the workers of the SparkApplication controller
I0422 10:59:28.351944      10 controller.go:97] Starting the ScheduledSparkApplication controller
I0422 10:59:28.351951      10 controller.go:103] Starting the workers of the ScheduledSparkApplication controller

Environment & Versions

  • Spark Operator App version: v1beta2-1.4.3-3.5.0
  • Helm Chart Version: 1.2.7
  • Kubernetes Version: v1.28.3
  • Apache Spark version: 3.5.1

Additional context

I am not using the latest version of the Helm chart (version 1.2.12) because I am facing an issue similar to #1991

@matthewrossi
Copy link
Contributor Author

After further troubleshooting I discovered that the issue was related to replacing sparkJobNamespace with sparkJobNamespaces in #1955.
So I manage to get the Spark application working by changing the helm install instruction to helm install my-release spark-operator/spark-operator --namespace spark-operator --create-namespace --version 1.2.7 --set "sparkJobNamespaces={default}".

The instructions in README.md and Quick Start Guide are not up-to-date with these changes.

@matthewrossi matthewrossi changed the title [BUG] Spark application does not start on new Helm chart versions Spark application does not start on new Helm chart versions May 10, 2024
@matthewrossi
Copy link
Contributor Author

Closing the issue since the documentation has been updated to reflect the new functioning in #2000.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant