Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement CRD with embedded PodSpec #452

Merged
merged 33 commits into from
May 12, 2022

Conversation

samdyzon
Copy link
Contributor

@samdyzon samdyzon commented May 4, 2022

The following is my formal design proposal for the implementation of reference-able k8s specs embedded within the operator CRD files.

Goals

  • Implement robust dask worker/scheduler pod specification for the operator CRD
  • Prevent rewriting of standard k8s resources by embedding k8s definitions inside the CRD
  • Use existing operator architecture and minimise required changes in operator
  • Use existing operator tests

CRD Templates

Currently there are two main custom resources created for the operator: DaskCluster and DaskWorkerGroup.
This proposal does not change the type of custom resources, but aims to update their structure slightly.

First, we should create "pseudo custom resources" which are templates for Dask resources, but are not fully defined CRDs themselves.

dask_kubernetes/operator/customresources/templates.yaml

definitions:

  dask.k8s.api.v1.DaskWorker:
    type: object
    description: Dask Worker configuration
    required:
    - spec
    properties:
      replicas:
        type: integer
        default: 1
        description: Number of workers to start
      spec:
        $ref: 'python://k8s_crd_resolver/schemata/k8s-1.21.1.json#/definitions/io.k8s.api.core.v1.PodSpec'
 
  dask.k8s.api.v1.DaskScheduler:
    type: object
    description: Dask scheduler configuration
    required:
    - spec
    - service
    properties:
      spec:
        $ref: 'python://k8s_crd_resolver/schemata/k8s-1.21.1.json#/definitions/io.k8s.api.core.v1.PodSpec'
      service:
        $ref: 'python://k8s_crd_resolver/schemata/k8s-1.21.1.json#/definitions/io.k8s.api.core.v1.ServiceSpec'

This yaml file provides the same style of resource definition that k8s uses in the k8s-1.21.1.json specification files.
The goal of building these definitions is to allow reuse of them in the real custom resources.

dask_kubernetes/operator/customresources/daskworkergroup.yaml

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: daskworkergroups.kubernetes.dask.org
spec:
  scope: Namespaced
  group: kubernetes.dask.org
  names:
    kind: DaskWorkerGroup
    plural: daskworkergroups
    singular: daskworkergroup
    shortNames:
      - daskworkers
  versions:
    - name: v1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          type: object
          required:
          - cluster
          - worker
          properties:
            cluster:
              type: string
              description: Name of the cluster this worker group belongs to
            worker:
              $ref: 'python://dask_kubernetes/operator/customresources/templates.yaml#/definitions/dask.k8s.api.v1.DaskWorker' 
            status:
              type: object
      subresources:
        scale:
          specReplicasPath: .replicas
          statusReplicasPath: .status.replicas

dask_kubernetes/operator/customresources/daskcluster.yaml

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: daskclusters.kubernetes.dask.org
spec:
  scope: Namespaced
  group: kubernetes.dask.org
  names:
    kind: DaskCluster
    plural: daskclusters
    singular: daskcluster
    shortNames:
      - daskcluster
      - dsk
  versions:
    - name: v1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          type: object
          required:
          - name
          - worker
          - scheduler
          properties:
            name:
              type: string
              description: Name of the Dask Cluster
            scheduler:
              $ref: 'python://dask_kubernetes/operator/customresources/templates.yaml#/definitions/dask.k8s.api.v1.DaskScheduler'
            worker:
              $ref: 'python://dask_kubernetes/operator/customresources/templates.yaml#/definitions/dask.k8s.api.v1.DaskWorker' 
            status:
              type: object
      subresources:
        scale:
          specReplicasPath: .replicas
          statusReplicasPath: .status.replicas

By abstracting the DaskScheduler and DaskWorker we get a small amount of code-reuse in the main CRD templates, and have a more semantic description of resources in the CRD templates.

Edit: We also needed to include some "patch" files into the build process, as mentioned by the k8s-crd-resolver documentation. In this case, I created a patch file for each CRD template, which contained the patches needed for that resource:

dask_kubernetes/operator/customresources/daskcluster.patch.yaml

[
  {"op": "add", "path": "/spec/versions/0/schema/openAPIV3Schema/properties/spec/properties/worker/properties/spec/properties/initContainers/items/properties/ports/items/properties/protocol/default", "value": "TCP"},
  {"op": "add", "path": "/spec/versions/0/schema/openAPIV3Schema/properties/spec/properties/worker/properties/spec/properties/containers/items/properties/ports/items/properties/protocol/default", "value": "TCP"},
  {"op": "add", "path": "/spec/versions/0/schema/openAPIV3Schema/properties/spec/properties/scheduler/properties/service/properties/ports/items/properties/protocol/default", "value": "TCP"},
  {"op": "add", "path": "/spec/versions/0/schema/openAPIV3Schema/properties/spec/properties/scheduler/properties/spec/properties/containers/items/properties/ports/items/properties/protocol/default", "value": "TCP"},
  {"op": "add", "path": "/spec/versions/0/schema/openAPIV3Schema/properties/spec/properties/scheduler/properties/spec/properties/initContainers/items/properties/ports/items/properties/protocol/default", "value": "TCP"}
]

These patch files are necessary to prevent errors during installation of resources for the CRD (not the CRD itself). The k8s-crd-resolver command has to be updated to include this patch file. (see updated build scripts below)

Some notes:

  • For DaskCluster, I split the worker/scheduler definitions into separate properties to be more explicit about what belongs to which resource, this will affect the operator implementation.

Building the CRD

# Assuming running from the root directory, and a temp folder called "dist" exists
k8s-crd-resolver -r -j dask_kubernetes/operator/customresources/daskcluster.patch.yaml dask_kubernetes/operator/customresources/daskcluster.yaml dist/daskcluster.yaml
# Assuming running from the root directory, and a temp folder called "dist" exists
k8s-crd-resolver -r -j dask_kubernetes/operator/customresources/daskworkergroup.patch.yaml dask_kubernetes/operator/customresources/daskworkergroup.yaml dist/daskworkergroup.yaml

Operator Changes

Once the CRDs have been generated, the operator expects the new pod specification to work. The only change needed in the operator is to update the build_scheduler_pod_spec, build_scheduler_service_spec, and build_worker_pod_spec methods. Since the operator will have all of the detail needed to create a spec, these methods simply need to pass the spec object and return the dictionary result. For example:

def build_scheduler_pod_spec(name, spec):
    return {
        "apiVersion": "v1",
        "kind": "Pod",
        "metadata": {
            "name": f"{name}-scheduler",
            "labels": {
                "dask.org/cluster-name": name,
                "dask.org/component": "scheduler",
            },
        },
        "spec": spec
    }

The build_scheduler_service_spec is slightly more complicated, because the selector property depends on metadata that has been generated in the build_scheduler_pod_spec and not provided by the developer. We must add additional documentation to the users warning them that they must provide the correct values for selector or else the service will not know what pod to connect to.

def build_scheduler_service_spec(name, spec):
    return {
        "apiVersion": "v1",
        "kind": "Service",
        "metadata": {
            "name": name,
            "labels": {
                "dask.org/cluster-name": name,
            },
        },
        "spec": spec
    }

Edit: The decision was made to use the above method as I believe it is more idiomatic approach, and k8s developers will be aware of the purpose of the service selector property.

We must also update all of the places these helper methods are called to pass in the appropriate values.

Testing

In order to run the test suite we have to update the customresources fixture defined in dask_kubernetes/operator/tests/conftest.py to run the build process first. This algorithm will look like:

  • create temp directory => temp_directory
  • run subprocess k8s-crd-resolver -r dask_kubernetes/operator/customresources/daskcluster.yaml {temp_directory}/daskcluster.yaml
    • check if the process returned a non 0 exit, raise exception if it failed
  • run subprocess k8s-crd-resolver -r dask_kubernetes/operator/customresources/daskworkergroup.yaml {temp_directory}/daskworkergroup.yaml
    • check if the process returned a non 0 exit, raise exception if it failed
  • run the k8s_cluster.kubectl("apply", "-f", temp_directory) method

@jacobtomlinson @Matt711 let me know what you guys think about this - it currently (and deliberately) excludes the build/publish part of the process since we're still working through that in #447.

@jacobtomlinson
Copy link
Member

I'm very much in favour of this!

My only thought is I wonder if we can PR k8s-crd-resolver to better factor out the main function. It would be cleaner to import it and call functions directly instead of using subprocess. But right now the main function is a mash up of CLI arg parsing and actual logic.

@samdyzon samdyzon force-pushed the feat/operator-pod-spec branch from 65ec2f9 to 42c1be3 Compare May 7, 2022 01:56
@samdyzon
Copy link
Contributor Author

samdyzon commented May 8, 2022

Ok, so the progress so far is as follows:

  • Implemented CRD templates
  • Updated operator to use new CRD structure
  • Updated experimental.KubeCluster to use the new CRD structure but keep the API the same
  • Update conftest.py to build the CRD templates when running tests
  • Update tests based on new service name
  • Add Helm chart and manifests under the resources directory (same style as dask-gateway)
  • Implemented local .pre-commit hook to build the new charts before commiting

However, I'm having some difficulty with the tests. For the operator integration tests test_scalesimplecluster and test_simplecluster pass on my local machines, but seem to either timeout or fail when trying to communicate with the Dask scheduler and I can't replicate the issue on my local environment.

The test_kubecluster, test_multiple_clusters, test_multiple_clusters_simultaneously tests all fail due to a timeout with the Client connecting to the deployed K8s service - the creation of the KubeCluster object and associated kubernetes resources succeeds but then the Dask client cannot connect and it falls over.

@jacobtomlinson @Matt711 Have you guys encountered these issues before with testing? Any thoughts how to fix the issues?

Left TODO:

  • Implement CI script for deploying new version of the helm chart in resources/helm (looking at dask-gateway implementation, though I reckon its pretty much the same script)
  • Update Documentation to include details about helm chart installation

My only thought is I wonder if we can PR k8s-crd-resolver to better factor out the main function. It would be cleaner to import it and call functions directly instead of using subprocess. But right now the main function is a mash up of CLI arg parsing and actual logic.

Yeah I'm onboard with this, but I don't have heaps of time to contribute there - for now I'm relying on subprocess for testing and the pre-commit hooks.

@samdyzon samdyzon marked this pull request as ready for review May 8, 2022 02:41
Copy link
Member

@jacobtomlinson jacobtomlinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great. I left a couple of small comments.

Not entirely sure why the tests are failing. We are continuing to develop this and I'm actively trying to make the tests more stable. So perhaps try pulling from main and resolving your conflicts and things may improve.

I also see things are failing the linting step. Could you ensure you have pre-commit set up and you run it on everything (pre-commit run --all might be helpful rather than doing it at commit time).

It would be good to see docs for the helm chart in this PR. But if you want to defer the deployment CI stuff to a follow up PR that may help to get this merged sooner.

@@ -16,8 +16,8 @@ To install the the operator first we need to create the Dask custom resources:

.. code-block:: console

$ kubectl apply -f https://raw.githubusercontent.com/dask/dask-kubernetes/main/dask_kubernetes/operator/customresources/daskcluster.yaml
$ kubectl apply -f https://raw.githubusercontent.com/dask/dask-kubernetes/main/dask_kubernetes/operator/customresources/daskworkergroup.yaml
$ kubectl apply -f https://raw.githubusercontent.com/dask/dask-kubernetes/main/dask_kubernetes/resources/manifests/daskcluster.yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that a few different things live in this repo I would like to keep everything nested under operator instead of moving it up to the top level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, I've moved the resources into dask_kubernetes/operator/deployment

Comment on lines 54 to 108
apiVersion: kubernetes.dask.org/v1
kind: DaskCluster
metadata:
name: simple-cluster
name: simple-cluster
spec:
image: "daskdev/dask:latest"
replicas: 3
worker:
replicas: 2
spec:
containers:
- name: worker
image: "daskdev/dask:latest"
imagePullPolicy: "IfNotPresent"
args:
- dask-worker
# Note the name of the cluster service, which adds "-service" to the end
- tcp://simple-cluster-service.default.svc.cluster.local:8786
scheduler:
spec:
containers:
- name: scheduler
image: "daskdev/dask:latest"
imagePullPolicy: "IfNotPresent"
args:
- dask-scheduler
ports:
- name: comm
containerPort: 8786
protocol: TCP
- name: dashboard
containerPort: 8787
protocol: TCP
readinessProbe:
tcpSocket:
port: comm
initialDelaySeconds: 5
periodSeconds: 10
livenessProbe:
tcpSocket:
port: comm
initialDelaySeconds: 15
periodSeconds: 20
service:
type: NodePort
selector:
dask.org/cluster-name: simple-cluster
dask.org/component: scheduler
ports:
- name: comm
protocol: TCP
port: 8786
targetPort: "comm"
- name: dashboard
protocol: TCP
port: 8787
targetPort: "dashboard"
Copy link
Member

@jacobtomlinson jacobtomlinson May 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the things that made me nervous. This boilerplate is going to be mostly the same in 99% of deployments, it's a shame it's so verbose. But I take your point about folks being used to specs being verbose.

The cluster name in the service selector has to match the one set in the metadata at the top right? Could we use a YAML anchor here to avoid the repetition?

...
metadata:
  name: &cluster simple-cluster
...
spec:
  ...
  scheduler:
  ...
    service:
      type: NodePort
      selector:
        dask.org/cluster-name: *cluster
        dask.org/component: scheduler

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah thats the nature of declarative structures I think. I don't know if YAML anchors work with k8s manifests - will have to experiment and see if the API will accept it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jacobtomlinson
Copy link
Member

Yeah I'm onboard with this, but I don't have heaps of time to contribute there - for now I'm relying on subprocess for testing and the pre-commit hooks.

I am happy to do this.

@samdyzon samdyzon force-pushed the feat/operator-pod-spec branch from db82465 to dd7b23d Compare May 9, 2022 21:54
@samdyzon
Copy link
Contributor Author

samdyzon commented May 9, 2022

Not entirely sure why the tests are failing. We are continuing to develop this and I'm actively trying to make the tests more stable. So perhaps try pulling from main and resolving your conflicts and things may improve.

I've rebased against the upstream main now, so this PR is up-to-date

I also see things are failing the linting step. Could you ensure you have pre-commit set up and you run it on everything (pre-commit run --all might be helpful rather than doing it at commit time).

The linting step is failing due to my attempt to create a pre-commit hook which builds the CRD templates, but it fails because the k8s-crd-resolver needs the dask_kubernetes package to be installed to the local environment. The pre-commit hook works locally but not in CI. Will keep working through this but happy for any suggestions

It would be good to see docs for the helm chart in this PR. But if you want to defer the deployment CI stuff to a follow up PR that may help to get this merged sooner.

Sure thing. Currently the helm chart is just in the source, but eventually it may be published to the dask/helm-charts repo using the same mechanism as dask-gateway, so the docs should probably reflect installing from the published repo rather than the source. But I can also add some doco about the values.yaml configuration which will be relevant.

Sam Dyson added 21 commits May 11, 2022 08:15
…res, and add k8s-crd-resolver to test requirements
…ler name call to operator daskworkergroup_update
…of new versions of the CRD templates on commit
…correct files for custom pre-commit script
….txt and remove it from pre-commit hook (and update env)
@samdyzon samdyzon force-pushed the feat/operator-pod-spec branch from 11ee196 to cd7fa3f Compare May 10, 2022 22:29
@samdyzon
Copy link
Contributor Author

Hi @jacobtomlinson, any chance you could do a review on this one and consider the changes? I'm having a hard time keeping up with the rebasing of the main branch - there were several conflicts this time and I'm worried that I'll be undoing stuff from your work.

@jacobtomlinson
Copy link
Member

I'm having a hard time keeping up with the rebasing of the main branch - there were several conflicts this time and I'm worried that I'll be undoing stuff from your work.

That's fair. We are moving pretty fast right now so I appreciate it's tricky tracking the upstream changes. I'm mostly focused on getting the tests/CI more stable in the hopes that it resolves what is happening here.

I'll grab a copy locally for review. If you don't mind I may push commits here to try and get things unstuck.

@jacobtomlinson
Copy link
Member

I've made a few fixes and pushed up to see how the CI goes. For me the operator tests are passing locally but the experimental cluster manager ones are failing, I'll keep digging.

@jacobtomlinson
Copy link
Member

I've managed to get most tests passing locally. The ones that are still failing are dask_kubernetes/experimental/tests/test_kubecluster.py::test_cluster_from_name and dask_kubernetes/experimental/tests/test_kubecluster.py::test_additional_worker_groups

I'll continue digging into it tomorrow.

I'm noticing a lot of this error in the operator logs. It might be the culprit. @samdyzon does anything stand out to you that might be causing it?

DEBUG    kubernetes_asyncio.client.rest:rest.py:184 response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"services \\"abc-cluster\\" not found","reason":"NotFound","details":{"name":"abc-cluster","kind":"services"},"code":404}\n'

@samdyzon
Copy link
Contributor Author

samdyzon commented May 12, 2022

I found a couple of bugs related to names of services and the worker groups inside KubeCluster that didnt match the operator which should solve the errors for dask_kubernetes/experimental/tests/test_kubecluster.py::test_cluster_from_name and dask_kubernetes/experimental/tests/test_kubecluster.py::test_additional_worker_groups. Still not sure why the operator tests fail when the get to they client.wait_for_workers() steps - there are some warnings about mismatched versions, could that be an issue?

@jacobtomlinson
Copy link
Member

Awesome thanks, tests are passing locally for me now. I'll start digging into the CI complaints.

I've also looked a little into the pre-commit hooks you added. It seems that it is trying to install all of dask_kubernetes in the hook venv which we don't really want it to do. I'll look into doing it just as a regular script and see if I can get that working.

@jacobtomlinson
Copy link
Member

Nearly there! Looks like the fixes from #461 had been accidentally reverted. Hopefully this will be it!

@jacobtomlinson jacobtomlinson merged commit 6339d5a into dask:main May 12, 2022
@jacobtomlinson
Copy link
Member

🚀🚀🚀🚀🚀🚀🚀🚀

Thanks for all your efforts here @samdyzon!

@samdyzon
Copy link
Contributor Author

Sweet! Thanks for helping get it over the line! We'll be able to start using it in anger soon - especially with the autoscaling functionality in the works. Cheers!

@jacobtomlinson
Copy link
Member

Sure thing, thanks for sticking around and responding to reviews. Yeah looking forward to autoscaling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants