Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the ability to specify labels to Ray Jobs through configuration #4926

Closed
2 tasks done
arvind-chandra opened this issue Feb 21, 2024 · 15 comments
Closed
2 tasks done
Assignees
Labels
bug Something isn't working enhancement New feature or request

Comments

@arvind-chandra
Copy link

Describe the bug

Add ability to specify labels for Ray Jobs thru the configuration.

Expected behavior

Add the ability to specify labels to Ray Jobs through configuration. This is needed so that we can filter on certain jobs that are run through Flyte.

Additional context to reproduce

No response

Screenshots

No response

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@arvind-chandra arvind-chandra added bug Something isn't working untriaged This issues has not yet been looked at by the Maintainers labels Feb 21, 2024
Copy link

dosubot bot commented Feb 21, 2024

🤖

Sure, I can guide you on how to add a feature to specify labels for Ray Jobs through the configuration in the Flyte repository.

Firstly, you would need to modify the WorkerGroupSpec class in the Ray.java file. You can add a new field for labels in this class. This field can be a Map that stores the labels as key-value pairs. Here is a sample code snippet on how you can add this:

public static final int LABELS_FIELD_NUMBER = 6;
private com.google.protobuf.MapField<
    java.lang.String, java.lang.String> labels_;
private com.google.protobuf.MapField<java.lang.String, java.lang.String>
internalGetLabels() {
  if (labels_ == null) {
    return com.google.protobuf.MapField.emptyMapField(
        LabelsDefaultEntryHolder.defaultEntry);
  }
  return labels_;
}

/**
 * <pre>
 * Optional. Labels are the key-value pairs that can be used to filter certain jobs.
 * </pre>
 *
 * <code>map&lt;string, string&gt; labels = 6;</code>
 */
public int getLabelsCount() {
  return internalGetLabels().getMap().size();
}
/**
 * <pre>
 * Optional. Labels are the key-value pairs that can be used to filter certain jobs.
 * </pre>
 *
 * <code>map&lt;string, string&gt; labels = 6;</code>
 */
public boolean containsLabels(
    java.lang.String key) {
  if (key == null) { throw new java.lang.NullPointerException(); }
  return internalGetLabels().getMap().containsKey(key);
}
/**
 * Use {@link #getLabelsMap()} instead.
 */
@java.lang.Deprecated
public java.util.Map<java.lang.String, java.lang.String> getLabels() {
  return getLabelsMap();
}
/**
 * <pre>
 * Optional. Labels are the key-value pairs that can be used to filter certain jobs.
 * </pre>
 *
 * <code>map&lt;string, string&gt; labels = 6;</code>
 */
public java.util.Map<java.lang.String, java.lang.String> getLabelsMap() {
  return internalGetLabels().getMap();
}
/**
 * <pre>
 * Optional. Labels are the key-value pairs that can be used to filter certain jobs.
 * </pre>
 *
 * <code>map&lt;string, string&gt; labels = 6;</code>
 */
public java.lang.String getLabelsOrDefault(
    java.lang.String key,
    java.lang.String defaultValue) {
  if (key == null) { throw new java.lang.NullPointerException(); }
  java.util.Map<java.lang.String, java.lang.String> map =
      internalGetLabels().getMap();
  return map.containsKey(key) ? map.get(key) : defaultValue;
}
/**
 * <pre>
 * Optional. Labels are the key-value pairs that can be used to filter certain jobs.
 * </pre>
 *
 * <code>map&lt;string, string&gt; labels = 6;</code>
 */
public java.lang.String getLabelsOrThrow(
    java.lang.String key) {
  if (key == null) { throw new java.lang.NullPointerException(); }
  java.util.Map<java.lang.String, java.lang.String> map =
      internalGetLabels().getMap();
  if (!map.containsKey(key)) {
    throw new java.lang.IllegalArgumentException();
  }
  return map.get(key);
}

This code adds a new field labels to the WorkerGroupSpec class. The labels field is a Map that stores the labels as key-value pairs. The getLabelsCount, containsLabels, getLabels, getLabelsMap, getLabelsOrDefault, and getLabelsOrThrow methods are used to interact with the labels field.

Please note that you also need to add the corresponding methods in the Builder class to set the labels.

Remember to regenerate the protobuf files after modifying the .proto file. You can find the Ray.java file in the flyteidl/gen/pb-java/flyteidl/plugins directory.

Secondly, the configuration for Ray Jobs in the Flyte codebase is defined in the file flyteplugins/go/tasks/plugins/k8s/ray/config.go.

The Config struct in this file contains the configuration for the 'ray' plugin, including parameters such as whether to shut down after the job finishes, the TTL to clean up the RayCluster, the Kubernetes Service Type, whether to include a Ray Dashboard, and the host for the dashboard server.

The DefaultConfig struct contains the default configuration for the head node and worker node of ray jobs, including start parameters and IP address.

The NodeConfig struct defines the start parameters and IP address for a node.

The GetConfig and SetConfig functions are used to get and set the configuration respectively. You can find the config.go file in the flyteplugins/go/tasks/plugins/k8s/ray directory.

I hope this helps! If you have any other questions, feel free to ask.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@arvind-chandra arvind-chandra changed the title [BUG] Add the ability to specify labels to Ray Jobs through configuration Feb 21, 2024
@dosubot dosubot bot added the enhancement New feature or request label Feb 21, 2024
@arvind-chandra arvind-chandra mentioned this issue Feb 21, 2024
3 tasks
@eapolinario
Copy link
Contributor

eapolinario commented Feb 24, 2024

@arvind-chandra , adding labels can be achieved via pod templates. Can you give it a try?

@eapolinario eapolinario removed the untriaged This issues has not yet been looked at by the Maintainers label Feb 24, 2024
@arvind-chandra
Copy link
Author

arvind-chandra commented Feb 26, 2024

Hey @eapolinario, the reasoning we had for adding the labels to the Job Spec as opposed to using the pods is because Ray Jobs could have multiple pods. In order to pull from pods, we would have to make extra calls to check for the correct pod before being able to pull from it. On the other hand Ray Jobs only have one RayJob object, so we could directly pull from that without having to go thru the list of pods to get the label.

@peterghaddad
Copy link
Contributor

@eapolinario any updates on this one? Can we move it forward?

@eapolinario
Copy link
Contributor

@arvind-chandra , @peterghaddad , can you talk a bit more about how adding this fixed value to the propeller config is any better than establishing a convention using pod templates? For example, we could make it so that labels added to the primary container are the ones passed down to the Ray Job.

@eapolinario eapolinario self-assigned this Feb 29, 2024
@peterghaddad
Copy link
Contributor

@eapolinario in our particular use case, these labels would be added to every Ray Job so that we can filter Ray Jobs created by Flyte. Technically we could add a label to all Ray Jobs created by Flyte statically i.e created/by=flyte that may also be sufficient, but typically it's better practice to make labels configurable.

In regards to podTemplates, would the user have to define this for each task? If so, then it may be too much overhead.

Let me know your thoughts!

@peterghaddad
Copy link
Contributor

@eapolinario Hope you are well! Wanted to follow up here.

@eapolinario
Copy link
Contributor

@peterghaddad , forgive me for the delay, we've been busy making sure the next release is up to standards.

In regards to my suggestion, I should have been more clear. Pod templates come in 2 flavors:

  1. compile-time pod templates
  2. runtime pod templates

You're probably familiar with 1. as that essentially lets you define a pod template inline as part of defining a task. As you already pointed out, if you had to define that in each and every task then this would be a no-go. However, with runtime pod templates you have a mechanism to set default labels (and annotations) all pods spawned by Flyte. Flytepropeller's config file allows you to set default labels and annotations. In the config file/configmap, under the k8s section you can set these values, e.g.:

# example config
plugins:
   k8s:
    default-labels:
      label1: label-value1
    default-annotations:
      annotation1: annotation-value1

One limitation of this approach is that these labels are going to show up in all pods. You can also use the types of runtime pod templates to be able to target pods to specific namespaces.

@peterghaddad
Copy link
Contributor

peterghaddad commented Mar 11, 2024

@eapolinario I think we need this to live at the RayCRD level not the pod level. Can we please consider this PR? This is specific to the KubeRay custom resource definition.

On a large scale cluster it is quicker to query at the Ray Job level instead of querying all pods.

@eapolinario
Copy link
Contributor

eapolinario commented Mar 12, 2024

@peterghaddad , we use the podspec and other metadata to build the Ray CRD already (if you trace the code you'll notice that we use the pod spec to build the template passed to the Ray CRD). And Flyte pod templates are a way to configure, amongst other fields, labels and annotations to be set for all pods in a namespace.

Just to be clear, if you create a pod template like:

apiVersion: v1
kind: PodTemplate
metadata:
  name: flyte-template
  namespace: flytesnacks-development
template:
  metadata:
    labels:
      label1: value1
    annotations:
      foo: bar
      zoo: zaz
  spec:
    containers:
      - name: bogus-name
        image: bogus:image

And set the default pod template in the flytepropeller config:

    plugins:
      k8s:
        default-pod-template-name: flyte-template

And then restart propeller (so that the config is reloaded), all jobs created on the flytesnacks-development namespace from that time on will have the labels and annotations present in the pod template and those cascade to the Ray CRs created as part of executing the Flyte tasks.

@peterghaddad
Copy link
Contributor

Hey @eapolinario appreciate the fast response!

I was skimming through the code and still not seeing the pod label being used to set the labels at the Ray Job CRD level. For example:

apiVersion: ray.io/v1
kind: RayJob
metadata:
  name: test
  labels:
    created/by: 'flyte'

I think this is what we were hoping for because this will enable us to query all Ray Jobs created by flyte using the k8s golang api client.

@eapolinario
Copy link
Contributor

Let me do a walkthrough of the code to show how labels added to the pod spec end up in the Ray jobs.

Let's use constructV1Alpha1Job for this discussion, the same applies to constructV1Job. If you see, in https://github.com/flyteorg/flyte/blob/master/flyteplugins/go/tasks/plugins/k8s/ray/ray.go#L129C6-L129C26, we build an object of type rayv1alpha1.RayJob. The objectMeta passed as an argument to that function comes directly from the call to flytek8s.ToK8sPodSpec. If you trace how this object is modified along the way, we merge it with the base pod template. That objectMeta is then used to set labels (and other properties) in the construction of the ray job, specifically the head group spec, the same applies to the workers.

So in your case, you could define a pod template that only creates the label created/by: 'flyte' and follow my suggestion to get ray CRDs tagged with the right label without having to expose this in a configuration.

@peterghaddad
Copy link
Contributor

peterghaddad commented Mar 13, 2024

I appreciate the walkthrough! This is informative. @arvind-chandra does this work for you?
If so, then we can probably close this issue. @eapolinario maybe updating the documentation to explain this isn't a bad idea?

@eapolinario
Copy link
Contributor

Yeah, I'll take action to clarify the docs. Pod templates are very powerful, but we have to make sure that Flyte users know how to use them.

Thank you for your patience!

@arvind-chandra
Copy link
Author

@eapolinario Yeah that makes a lot of sense. Thank you for clarifying! I will close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants