Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Extended Resources] GPU Accelerators #1843

Merged
merged 71 commits into from
Nov 1, 2023
Merged

[Extended Resources] GPU Accelerators #1843

merged 71 commits into from
Nov 1, 2023

Conversation

jeevb
Copy link
Contributor

@jeevb jeevb commented Sep 20, 2023

TL;DR

Adds support for specifying extended resources (e.g. accelerator type - gpu device, partition size etc.) to allocate to a task.

Type

  • Bug Fix
  • Feature
  • Plugin

Are all requirements met?

  • Code completed
  • Smoke tested
  • Unit tests added
  • Code documentation added
  • Any pending items have an associated Issue

Complete description

Added support for populating ExtendedResources from the @task decorator or node .with_overrides(). Currently only supports the ability to specify the GPU accelerator to use for a specific task node. The following syntaxes are valid when specifying a task:

No preference of GPU accelerator to use:

@task(limits=Resources(gpu="1"))
def my_task() -> None:
    ...

Schedule on a specific GPU accelerator:

from flytekit.extras.accelerators import T4


@task(
    limits=Resources(gpu="1"),
    accelerator=T4,
)
def my_task() -> None:
    ...

Schedule on a Multi-instance GPU (MIG) accelerator with no preference of partition size:

from flytekit.extras.accelerators import A100


@task(
    limits=Resources(gpu="1"),
    accelerator=A100,
)
def my_task() -> None:
    ...

Schedule on a Multi-instance GPU (MIG) accelerator with a specific partition size:

from flytekit.extras.accelerators import A100


@task(
    limits=Resources(gpu="1"),
    accelerator=A100.partition_1g_5gb,
)
def my_task() -> None:
    ...

Schedule on an unpartitioned Multi-instance GPU (MIG) accelerator:

from flytekit.extras.accelerators import A100


@task(
    limits=Resources(gpu="1"),
    accelerator=A100.unpartitioned,
)
def my_task() -> None:
    ...

An override can also be specified for the GPU accelerator to use, as follows:

from flytekit.extra.accelerators import A100, T4


@task(
    limits=Resources(gpu="1"),
    accelerator=T4,
)
def my_task() -> str:
    return "hello"

@workflow
def my_wf() -> str:
    return my_task().with_overrides(
        accelerator=A100.partition_1g_5gb
    )

Tracking Issue

https://github.com/flyteorg/flyte/issues/

Follow-up issue

NA
OR
https://github.com/flyteorg/flyte/issues/

wild-endeavor and others added 30 commits September 20, 2023 14:39
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Jeev B <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Jeev B <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Jeev B <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Jeev B <[email protected]>
* Add Azure-specific headers when uploading to blob storage

Signed-off-by: Victor Delépine <[email protected]>

* Add comment about HTTP 201 check

Signed-off-by: Victor Delépine <[email protected]>

---------

Signed-off-by: Victor Delépine <[email protected]>
Signed-off-by: Jeev B <[email protected]>
Signed-off-by: Future Outlier <[email protected]>
Co-authored-by: Future Outlier <[email protected]>
Signed-off-by: Jeev B <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Jeev B <[email protected]>
…json for improved static type checking (#1801)

* Inherit directly from DataClassJsonMixin instead of @dataclass_json for improved static type checking

As it says in the dataclasses-json README: https://github.com/lidatong/dataclasses-json/blob/89578cb9ebed290e70dba8946bfdb68ff6746755/README.md?plain=1#L111-L129, we can use inheritance for improved static type checking; this one change eliminates something like 467 pyright errors from the flytekit module

Signed-off-by: Matthew Hoffman <[email protected]>
Signed-off-by: Jeev B <[email protected]>
---------
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Jeev B <[email protected]>
* Eager workflows to support async workflows

Signed-off-by: Niels Bantilan <[email protected]>

* move array node maptask to experimental/__init__.py

Signed-off-by: Niels Bantilan <[email protected]>

* clean up docs

Signed-off-by: Niels Bantilan <[email protected]>

* clean up

Signed-off-by: Niels Bantilan <[email protected]>

* more clean up

Signed-off-by: Niels Bantilan <[email protected]>

* docs cleanup

Signed-off-by: Niels Bantilan <[email protected]>

* Update test_eager_workflows.py

* clean up timeout handling

Signed-off-by: Niels Bantilan <[email protected]>

* fix lint

Signed-off-by: Niels Bantilan <[email protected]>

---------

Signed-off-by: Niels Bantilan <[email protected]>
Signed-off-by: Jeev B <[email protected]>
* fix secretsmanager

Signed-off-by: Yue Shang <[email protected]>

* fix lint issue

Signed-off-by: Yue Shang <[email protected]>

* add doc

Signed-off-by: Yue Shang <[email protected]>

* fix github check

Signed-off-by: Yue Shang <[email protected]>

---------

Signed-off-by: Yue Shang <[email protected]>
Signed-off-by: Jeev B <[email protected]>
* Batch upload flyte directory

Signed-off-by: Kevin Su <[email protected]>

* Update get method

Signed-off-by: Kevin Su <[email protected]>

* Move batch size to type engine

Signed-off-by: Kevin Su <[email protected]>

* comment

Signed-off-by: Kevin Su <[email protected]>

* update comment

Signed-off-by: Kevin Su <[email protected]>

* Update flytekit/core/type_engine.py

Co-authored-by: Eduardo Apolinario <[email protected]>

* Add test

Signed-off-by: Kevin Su <[email protected]>

---------

Signed-off-by: Kevin Su <[email protected]>
Co-authored-by: Eduardo Apolinario <[email protected]>
Signed-off-by: Jeev B <[email protected]>
- using incorrect type of overrides
 - using incorrect type for resources
 - using promises in overrides

Signed-off-by: Ketan Umare <[email protected]>
Signed-off-by: Jeev B <[email protected]>
* Beautified pyflyte run even for every task and workflow

- identify a task or a workflow
- task or workflow help menus show types and use rich to beautify

Signed-off-by: Ketan Umare <[email protected]>

* one more improvement

Signed-off-by: Ketan Umare <[email protected]>

* updated

Signed-off-by: Ketan Umare <[email protected]>

* updated command

Signed-off-by: Ketan Umare <[email protected]>

* Updated

Signed-off-by: Ketan Umare <[email protected]>

* updated formatting

Signed-off-by: Ketan Umare <[email protected]>

* updated

Signed-off-by: Ketan Umare <[email protected]>

* updated

Signed-off-by: Ketan Umare <[email protected]>

* bug fixed in types

Signed-off-by: Ketan Umare <[email protected]>

* Updated

Signed-off-by: Ketan Umare <[email protected]>

* lint

Signed-off-by: Kevin Su <[email protected]>

---------

Signed-off-by: Ketan Umare <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Co-authored-by: Kevin Su <[email protected]>
Signed-off-by: Jeev B <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Jeev B <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Jeev B <[email protected]>
Signed-off-by: Jeev B <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Jeev B <[email protected]>
…lytekit (#1819)

* Support the flytectl config.yaml admin.clientSecretEnvVar option in flytekit

Signed-off-by: Chao-Heng Lee <[email protected]>

* remove helper of getting env var.

Signed-off-by: Chao-Heng Lee <[email protected]>

* refactor variable name.

Signed-off-by: Chao-Heng Lee <[email protected]>

---------

Signed-off-by: Chao-Heng Lee <[email protected]>
Signed-off-by: Jeev B <[email protected]>
Signed-off-by: Future Outlier <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Co-authored-by: Future Outlier <[email protected]>
Co-authored-by: Kevin Su <[email protected]>
Signed-off-by: Jeev B <[email protected]>
Signed-off-by: Jeev B <[email protected]>
---------

Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Jeev B <[email protected]>
Signed-off-by: Adrian Rumpold <[email protected]>
Signed-off-by: Arthur <[email protected]>
Signed-off-by: wirthual <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: eduardo apolinario <[email protected]>
Signed-off-by: Jeev B <[email protected]>
Signed-off-by: Jeev B <[email protected]>
@jeevb jeevb marked this pull request as ready for review October 24, 2023 16:57
Copy link
Collaborator

@eapolinario eapolinario left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one minor comment, otherwise LGTM.

flytekit/extras/accelerators.py Outdated Show resolved Hide resolved
tests/flytekit/unit/models/test_workflow_closure.py Outdated Show resolved Hide resolved
@@ -24,6 +24,8 @@
from dataclasses import dataclass
from typing import Any, Coroutine, Dict, Generic, List, Optional, OrderedDict, Tuple, Type, TypeVar, Union, cast

from flyteidl.core import tasks_pb2 as _core_task
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's not use the _ anymore? i'm actively trying to remove it from the code base.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will clean up. Are you using tasks_pb2 directly, or still aliasing as core_task?

@@ -4,6 +4,8 @@
import typing
from typing import Any, List

from flyteidl.core import tasks_pb2 as _core_task
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@jeevb jeevb requested a review from wild-endeavor October 26, 2023 22:27
@kumare3
Copy link
Contributor

kumare3 commented Oct 27, 2023

@jeevb can you remind me why can we not do

@task(resources=Resources(gpu= NvidiaTeslaA100("1"))
   ...

Is this because of requests and limits?

Also can we drop
NvidiaTesla -> Simply keep A100 I think this is well understood, lets be less verbose

Also this is too verbose NvidiaTeslaA100.with_partition_size(NvidiaTeslaA100.partition_sizes.PARTITION_1G_5GB)
Can we make it
A100.partition(A100.partitions.5GB) (what is the use of 1G?)

@jeevb
Copy link
Contributor Author

jeevb commented Oct 27, 2023

@jeevb can you remind me why can we not do

@task(resources=Resources(gpu= NvidiaTeslaA100("1"))
   ...

Is this because of requests and limits?

In a way, yes. It introduces room for confusion when specifying required GPUs - would a request of T4(“1”) and a limit of A100(“1”) be valid? We could work around it by adding validations that GPU types match in requests and limits. In addition, we’ll have to break up the Resources object into count and GPU type at serialization time anyway since there are fundamental differences in the way we define resources in the Container and K8sPod models.

Also can we drop NvidiaTesla -> Simply keep A100 I think this is well understood, lets be less verbose

👍

Also this is too verbose NvidiaTeslaA100.with_partition_size(NvidiaTeslaA100.partition_sizes.PARTITION_1G_5GB) Can we make it A100.partition(A100.partitions.5GB) (what is the use of 1G?)

See: https://cloud.google.com/kubernetes-engine/docs/how-to/gpus-multi

Each partition size definition combines the compute: 1G, and memory: 5GB. 1g.5gb is the format that Nvidia uses. While GCP doesn’t support it currently, 3g.20gb and 4g.20gb are both valid for the A100, so specifying memory alone will not be sufficient.

An argument can be made that it would be better to be consistent with existing semantics instead of creating new ones, even if they may be more verbose. Note GCP’s use of the gke-gpu-partition-size label instead of gke-gpu-partition, for instance. That isn’t necessarily stopping us from doing it differently (both seem plausible), but it might come with the risk of introducing additional cognitive burden to users.

Also, A100.partition and A100.partitions might be too close. I can see if A100(A100.partitions.X) is feasible perhaps. Thoughts?

@jeevb
Copy link
Contributor Author

jeevb commented Oct 27, 2023

@kumare3: Went ahead and made the change, so the MIG definition is now:

  • A100 (unspecified partition size)
  • A100.partitioned(A100.partitions.PARTITION_1G_5GB) (explicit partition size)
  • A100.unpartitioned (explicit unpartitioned)

Wdyt?

wild-endeavor
wild-endeavor previously approved these changes Oct 28, 2023
@kumare3
Copy link
Contributor

kumare3 commented Oct 28, 2023

One question, what if we don't support partitioning - what happens if a user selects it - or thinking about it another way - what is the reason a user chooses a partitioned (specific) gpu. The reason to ask is to see if we can arrive with something so
Simple that it is obvious

@jeevb
Copy link
Contributor Author

jeevb commented Oct 28, 2023

One question, what if we don't support partitioning - what happens if a user selects it - or thinking about it another way - what is the reason a user chooses a partitioned (specific) gpu. The reason to ask is to see if we can arrive with something so
Simple that it is obvious

In a scenario where multiple A100 node pools are available, a user might choose to explicitly schedule on one with a larger partition size, or one that is unpartitioned. If the cluster only has a single GPU node group, however, the extra partition specification is not needed. For all other cases, users will likely need an escape hatch to schedule on specific GPU configs.

Signed-off-by: Jeev B <[email protected]>
@jeevb
Copy link
Contributor Author

jeevb commented Oct 31, 2023

Final defs (following offline conversation with @kumare3):

  • A100 (unspecified partition size)
  • A100.partition_1g_5gb (explicit partition size)
  • A100.unpartitioned (explicit unpartitioned)

@jeevb jeevb merged commit 4b1ad23 into master Nov 1, 2023
70 checks passed
@jeevb jeevb deleted the gpu-selector branch November 1, 2023 04:17
ringohoffman added a commit to ringohoffman/flytekit that referenced this pull request Nov 24, 2023
* pip through to container

Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Jeev B <[email protected]>

* move around

Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Jeev B <[email protected]>

* add asserts

Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Jeev B <[email protected]>

* delete bad line

Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Jeev B <[email protected]>

* switch to abc and add support for gpu unpartitioned

Signed-off-by: Jeev B <[email protected]>

* Add Azure-specific headers when uploading to blob storage (flyteorg#1784)

* Add Azure-specific headers when uploading to blob storage

Signed-off-by: Victor Delépine <[email protected]>

* Add comment about HTTP 201 check

Signed-off-by: Victor Delépine <[email protected]>

---------

Signed-off-by: Victor Delépine <[email protected]>
Signed-off-by: Jeev B <[email protected]>

* Add async delete function in base_agent (flyteorg#1800)

Signed-off-by: Future Outlier <[email protected]>
Co-authored-by: Future Outlier <[email protected]>
Signed-off-by: Jeev B <[email protected]>

* Add support for execution name prefixes (flyteorg#1803)

Signed-off-by: troychiu <[email protected]>
Signed-off-by: Jeev B <[email protected]>

* Remove ref in output (flyteorg#1794)

Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Jeev B <[email protected]>

* Inherit directly from DataClassJsonMixin instead of using @dataclass_json for improved static type checking (flyteorg#1801)

* Inherit directly from DataClassJsonMixin instead of @dataclass_json for improved static type checking

As it says in the dataclasses-json README: https://github.com/lidatong/dataclasses-json/blob/89578cb9ebed290e70dba8946bfdb68ff6746755/README.md?plain=1#L111-L129, we can use inheritance for improved static type checking; this one change eliminates something like 467 pyright errors from the flytekit module

Signed-off-by: Matthew Hoffman <[email protected]>
Signed-off-by: Jeev B <[email protected]>

* Async file sensor (flyteorg#1790)

---------
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Jeev B <[email protected]>

* Eager workflows to support async workflows (flyteorg#1579)

* Eager workflows to support async workflows

Signed-off-by: Niels Bantilan <[email protected]>

* move array node maptask to experimental/__init__.py

Signed-off-by: Niels Bantilan <[email protected]>

* clean up docs

Signed-off-by: Niels Bantilan <[email protected]>

* clean up

Signed-off-by: Niels Bantilan <[email protected]>

* more clean up

Signed-off-by: Niels Bantilan <[email protected]>

* docs cleanup

Signed-off-by: Niels Bantilan <[email protected]>

* Update test_eager_workflows.py

* clean up timeout handling

Signed-off-by: Niels Bantilan <[email protected]>

* fix lint

Signed-off-by: Niels Bantilan <[email protected]>

---------

Signed-off-by: Niels Bantilan <[email protected]>
Signed-off-by: Jeev B <[email protected]>

* Enable SecretsManager.get to load and return bytes (flyteorg#1798)

* fix secretsmanager

Signed-off-by: Yue Shang <[email protected]>

* fix lint issue

Signed-off-by: Yue Shang <[email protected]>

* add doc

Signed-off-by: Yue Shang <[email protected]>

* fix github check

Signed-off-by: Yue Shang <[email protected]>

---------

Signed-off-by: Yue Shang <[email protected]>
Signed-off-by: Jeev B <[email protected]>

* Batch upload flyte directory (flyteorg#1806)

* Batch upload flyte directory

Signed-off-by: Kevin Su <[email protected]>

* Update get method

Signed-off-by: Kevin Su <[email protected]>

* Move batch size to type engine

Signed-off-by: Kevin Su <[email protected]>

* comment

Signed-off-by: Kevin Su <[email protected]>

* update comment

Signed-off-by: Kevin Su <[email protected]>

* Update flytekit/core/type_engine.py

Co-authored-by: Eduardo Apolinario <[email protected]>

* Add test

Signed-off-by: Kevin Su <[email protected]>

---------

Signed-off-by: Kevin Su <[email protected]>
Co-authored-by: Eduardo Apolinario <[email protected]>
Signed-off-by: Jeev B <[email protected]>

* Better error messaging for overrides (flyteorg#1807)

- using incorrect type of overrides
 - using incorrect type for resources
 - using promises in overrides

Signed-off-by: Ketan Umare <[email protected]>
Signed-off-by: Jeev B <[email protected]>

* Run remote Launchplan from `pyflyte run` (flyteorg#1785)

* Beautified pyflyte run even for every task and workflow

- identify a task or a workflow
- task or workflow help menus show types and use rich to beautify

Signed-off-by: Ketan Umare <[email protected]>

* one more improvement

Signed-off-by: Ketan Umare <[email protected]>

* updated

Signed-off-by: Ketan Umare <[email protected]>

* updated command

Signed-off-by: Ketan Umare <[email protected]>

* Updated

Signed-off-by: Ketan Umare <[email protected]>

* updated formatting

Signed-off-by: Ketan Umare <[email protected]>

* updated

Signed-off-by: Ketan Umare <[email protected]>

* updated

Signed-off-by: Ketan Umare <[email protected]>

* bug fixed in types

Signed-off-by: Ketan Umare <[email protected]>

* Updated

Signed-off-by: Ketan Umare <[email protected]>

* lint

Signed-off-by: Kevin Su <[email protected]>

---------

Signed-off-by: Ketan Umare <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Co-authored-by: Kevin Su <[email protected]>
Signed-off-by: Jeev B <[email protected]>

* Add is none function (flyteorg#1757)

Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Jeev B <[email protected]>

* Dynamic workflow should not throw nested task warning (flyteorg#1812)

Signed-off-by: oliverhu <[email protected]>
Signed-off-by: Jeev B <[email protected]>

* Add a manual image building GH action (flyteorg#1816)

Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Jeev B <[email protected]>

* catch abfs protocol in data_persistence.py/get_filesystem and set anon to False (flyteorg#1813)

Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jeev B <[email protected]>

* None doesnt work

Signed-off-by: Jeev B <[email protected]>

* unpartitioned selector

Signed-off-by: Jeev B <[email protected]>

* Fix list of annotated structured dataset (flyteorg#1817)

Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Jeev B <[email protected]>

* Support the flytectl config.yaml admin.clientSecretEnvVar option in flytekit (flyteorg#1819)

* Support the flytectl config.yaml admin.clientSecretEnvVar option in flytekit

Signed-off-by: Chao-Heng Lee <[email protected]>

* remove helper of getting env var.

Signed-off-by: Chao-Heng Lee <[email protected]>

* refactor variable name.

Signed-off-by: Chao-Heng Lee <[email protected]>

---------

Signed-off-by: Chao-Heng Lee <[email protected]>
Signed-off-by: Jeev B <[email protected]>

* Async agent delete function for while loop case (flyteorg#1802)

Signed-off-by: Future Outlier <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Co-authored-by: Future Outlier <[email protected]>
Co-authored-by: Kevin Su <[email protected]>
Signed-off-by: Jeev B <[email protected]>

* refactor

Signed-off-by: Jeev B <[email protected]>

* fix docs warnings (flyteorg#1827)

Signed-off-by: Jeev B <[email protected]>

* Fix extract_task_module (flyteorg#1829)

---------

Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Jeev B <[email protected]>

* Feat: Add type support for pydantic BaseModels (flyteorg#1660)

Signed-off-by: Adrian Rumpold <[email protected]>
Signed-off-by: Arthur <[email protected]>
Signed-off-by: wirthual <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: eduardo apolinario <[email protected]>
Signed-off-by: Jeev B <[email protected]>

* add test for unspecified mig

Signed-off-by: Jeev B <[email protected]>

* add support for overriding accelerator

Signed-off-by: Jeev B <[email protected]>

* cleanup

Signed-off-by: Jeev B <[email protected]>

* move from core to extras

Signed-off-by: Jeev B <[email protected]>

* fixes

Signed-off-by: Jeev B <[email protected]>

* fixes

Signed-off-by: Jeev B <[email protected]>

* fixes

Signed-off-by: Jeev B <[email protected]>

* cleanup

Signed-off-by: Jeev B <[email protected]>

* Make FlyteRemote slightly more copy/pastable (flyteorg#1830)

Signed-off-by: Katrina Rogan <[email protected]>
Signed-off-by: Jeev B <[email protected]>

* Pyflyte meta inputs (flyteorg#1823)

* Re-orgining pyflyte run

Signed-off-by: Ketan Umare <[email protected]>

* Pyflyte beautified and simplified

Signed-off-by: Ketan Umare <[email protected]>

* fixed unit test

Signed-off-by: Ketan Umare <[email protected]>

* Added Launch options

Signed-off-by: Ketan Umare <[email protected]>

* lint fix

Signed-off-by: Ketan Umare <[email protected]>

* test fix

Signed-off-by: Ketan Umare <[email protected]>

* fixing docs failure

Signed-off-by: Ketan Umare <[email protected]>

---------

Signed-off-by: Ketan Umare <[email protected]>
Signed-off-by: Jeev B <[email protected]>

* Use mashumaro to serialize/deserialize dataclass (flyteorg#1735)

Signed-off-by: HH <[email protected]>
Signed-off-by: hhcs9527 <[email protected]>
Signed-off-by: Matthew Hoffman <[email protected]>
Co-authored-by: Matthew Hoffman <[email protected]>
Signed-off-by: Jeev B <[email protected]>

* Databricks Agent (flyteorg#1797)

Signed-off-by: Future Outlier <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Co-authored-by: Future Outlier <[email protected]>
Co-authored-by: Kevin Su <[email protected]>
Signed-off-by: Jeev B <[email protected]>

* Prometheus metrics (flyteorg#1815)

Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Jeev B <[email protected]>

* Pyflyte register optionally activates schedule (flyteorg#1832)

* Pyflyte register auto activates schedule

Signed-off-by: Ketan Umare <[email protected]>

* comment addressed

Signed-off-by: Ketan Umare <[email protected]>

---------

Signed-off-by: Ketan Umare <[email protected]>
Signed-off-by: Jeev B <[email protected]>

* Remove versions 3.9 and 3.10 (flyteorg#1831)

Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Jeev B <[email protected]>

* Snowflake agent (flyteorg#1799)

Signed-off-by: hhcs9527 <[email protected]>
Signed-off-by: HH <[email protected]>
Signed-off-by: Jeev B <[email protected]>

* Update agent metric name (flyteorg#1835)

Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Jeev B <[email protected]>

* MemVerge MMCloud Agent (flyteorg#1821)

Signed-off-by: Edwin Yu <[email protected]>
Signed-off-by: Jeev B <[email protected]>

* Add download badges in readme (flyteorg#1836)

Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Jeev B <[email protected]>

* Eager local entrypoint and support for offloaded types (flyteorg#1833)

* implement eager workflow local entrypoint, support offloaded types

Signed-off-by: Niels Bantilan <[email protected]>

* wip local entrypoint

Signed-off-by: Niels Bantilan <[email protected]>

* add tests

Signed-off-by: Niels Bantilan <[email protected]>

* add local entrypoint tests

Signed-off-by: Niels Bantilan <[email protected]>

* update eager unit tests, delete test script

Signed-off-by: Niels Bantilan <[email protected]>

* clean up tests

Signed-off-by: Niels Bantilan <[email protected]>

* update ci

Signed-off-by: Niels Bantilan <[email protected]>

* update ci

Signed-off-by: Niels Bantilan <[email protected]>

* update ci

Signed-off-by: Niels Bantilan <[email protected]>

* update ci

Signed-off-by: Niels Bantilan <[email protected]>

* remove push step

Signed-off-by: Niels Bantilan <[email protected]>

---------

Signed-off-by: Niels Bantilan <[email protected]>
Signed-off-by: Jeev B <[email protected]>

* update requirements and add snowflake agent to api reference (flyteorg#1838)

* update requirements and add snowflake agent to api reference

Signed-off-by: Samhita Alla <[email protected]>

* update requirements

Signed-off-by: Samhita Alla <[email protected]>

* remove versions

Signed-off-by: Samhita Alla <[email protected]>

* remove tensorflow-macos

Signed-off-by: Samhita Alla <[email protected]>

* lint

Signed-off-by: Samhita Alla <[email protected]>

* downgrade sphinxcontrib-youtube package

Signed-off-by: Samhita Alla <[email protected]>

---------

Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: Jeev B <[email protected]>

* Fix: Make sure decks created in elastic task workers are transferred to parent process (flyteorg#1837)

* Transfer decks created in the worker process to the parent process

Signed-off-by: Fabio Graetz <[email protected]>

* Add test for decks in elastic tasks

Signed-off-by: Fabio Graetz <[email protected]>

* Update plugins/flytekit-kf-pytorch/flytekitplugins/kfpytorch/task.py

Signed-off-by: Fabio Graetz <[email protected]>

* Update plugins/flytekit-kf-pytorch/flytekitplugins/kfpytorch/task.py

Signed-off-by: Fabio Graetz <[email protected]>

---------

Signed-off-by: Fabio Graetz <[email protected]>
Signed-off-by: Jeev B <[email protected]>

* add accept grpc (flyteorg#1841)

* add accept grpc

Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Jeev B <[email protected]>

* unpin setup.py grpc

Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Jeev B <[email protected]>

* Revert "add accept grpc"

This reverts commit 2294592.

Signed-off-by: Jeev B <[email protected]>

* default headers interceptor

Signed-off-by: Jeev B <[email protected]>

* setup.py

Signed-off-by: Jeev B <[email protected]>

* fixes

Signed-off-by: Jeev B <[email protected]>

* fmt

Signed-off-by: Jeev B <[email protected]>

* move prometheus-client import

Signed-off-by: Jeev B <[email protected]>

---------

Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Jeev B <[email protected]>
Co-authored-by: Jeev B <[email protected]>
Signed-off-by: Jeev B <[email protected]>

* Feat: Enable `flytekit` to authenticate with proxy in front of FlyteAdmin (flyteorg#1787)

* Introduce authenticator engine and make proxy auth work

Signed-off-by: Fabio Grätz <[email protected]>

* Use proxy authed session for client credentials flow

Signed-off-by: Fabio Grätz <[email protected]>

* Don't use authenticator engine but do proxy authentication via existing external command authenticator

Signed-off-by: Fabio Grätz <[email protected]>

* Add docstring to AuthenticationHTTPAdapter

Signed-off-by: Fabio Grätz <[email protected]>

* Address todo in docstring

Signed-off-by: Fabio Grätz <[email protected]>

* Create blank session if none provided

Signed-off-by: Fabio Grätz <[email protected]>

* Create blank session if none provided in get_token

Signed-off-by: Fabio Grätz <[email protected]>

* Refresh proxy creds in session when not existing without triggering 401

Signed-off-by: Fabio Grätz <[email protected]>

* Add test for get_session

Signed-off-by: Fabio Grätz <[email protected]>

* Move auth helper test into existing module

Signed-off-by: Fabio Grätz <[email protected]>

* Move auth helper test into existing module

Signed-off-by: Fabio Grätz <[email protected]>

* Add test for upgrade_channel_to_proxy_authenticated

Signed-off-by: Fabio Grätz <[email protected]>

* Auth helper tests without use of responses package

Signed-off-by: Fabio Grätz <[email protected]>

* Feat: Add plugin for generating GCP IAP ID tokens via external command (flyteorg#1795)

* Add external command plugin to generate id tokens for identity aware proxy

Signed-off-by: Fabio Grätz <[email protected]>

* Retrieve desktop app client secret from gcp secret manager

Signed-off-by: Fabio Grätz <[email protected]>

* Remove comments

Signed-off-by: Fabio Grätz <[email protected]>

* Introduce a command group that allows adding a command to generate service account id tokens later

Signed-off-by: Fabio Grätz <[email protected]>

* Document how to use plugin and deploy Flyte with IAP

Signed-off-by: Fabio Grätz <[email protected]>

* Minor corrections README.md

Signed-off-by: Fabio Grätz <[email protected]>

---------

Signed-off-by: Fabio Grätz <[email protected]>
Co-authored-by: Fabio Grätz <[email protected]>
Signed-off-by: Fabio Grätz <[email protected]>

* Use proxy auth'ed session for device code auth flow

Signed-off-by: Fabio Grätz <[email protected]>

* Fix token client tests

Signed-off-by: Fabio Grätz <[email protected]>

* Make poll token endpoint test more specific

Signed-off-by: Fabio Grätz <[email protected]>

* Make test_client_creds_authenticator test work and more specific

Signed-off-by: Fabio Grätz <[email protected]>

* Make test_client_creds_authenticator_with_custom_scopes test work and more specific

Signed-off-by: Fabio Grätz <[email protected]>

* Implement subcommand to generate id tokens for service accounts

Signed-off-by: Fabio Graetz <[email protected]>

* Test id token generation from service accounts

Signed-off-by: Fabio Graetz <[email protected]>

* Fix plugin requirements

Signed-off-by: Fabio Graetz <[email protected]>

* Document usage of generate-service-account-id-token subcommand

Signed-off-by: Fabio Grätz <[email protected]>

* Document alternative ways to obtain service account id tokens

Signed-off-by: Fabio Grätz <[email protected]>

---------

Signed-off-by: Fabio Grätz <[email protected]>
Signed-off-by: Fabio Graetz <[email protected]>
Co-authored-by: Fabio Grätz <[email protected]>
Signed-off-by: Jeev B <[email protected]>

* bump flyteidl

Signed-off-by: Jeev B <[email protected]>

* make requirements

Signed-off-by: Jeev B <[email protected]>

* fix failing tests

Signed-off-by: Jeev B <[email protected]>

* move gpu accelerator to flyteidl.core.Resources

Signed-off-by: Jeev B <[email protected]>

* Use ResourceExtensions for extended resources

Signed-off-by: Jeev B <[email protected]>

* cleanup

Signed-off-by: Jeev B <[email protected]>

* Switch to using ExtendedResources in TaskTemplate

Signed-off-by: Jeev B <[email protected]>

* cleanups

Signed-off-by: Jeev B <[email protected]>

* update flyteidl

Signed-off-by: Jeev B <[email protected]>

* Replace _core_task imports with tasks_pb2

Signed-off-by: Jeev B <[email protected]>

* less verbose definitions

Signed-off-by: Jeev B <[email protected]>

* Attempt at less confusing syntax

Signed-off-by: Jeev B <[email protected]>

* Streamline UX

Signed-off-by: Jeev B <[email protected]>

* Run make fmt

Signed-off-by: Jeev B <[email protected]>

---------

Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Jeev B <[email protected]>
Signed-off-by: Victor Delépine <[email protected]>
Signed-off-by: Future Outlier <[email protected]>
Signed-off-by: troychiu <[email protected]>
Signed-off-by: Matthew Hoffman <[email protected]>
Signed-off-by: Niels Bantilan <[email protected]>
Signed-off-by: Yue Shang <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Ketan Umare <[email protected]>
Signed-off-by: oliverhu <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Chao-Heng Lee <[email protected]>
Signed-off-by: Adrian Rumpold <[email protected]>
Signed-off-by: Arthur <[email protected]>
Signed-off-by: wirthual <[email protected]>
Signed-off-by: eduardo apolinario <[email protected]>
Signed-off-by: Katrina Rogan <[email protected]>
Signed-off-by: HH <[email protected]>
Signed-off-by: hhcs9527 <[email protected]>
Signed-off-by: Edwin Yu <[email protected]>
Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: Fabio Graetz <[email protected]>
Signed-off-by: Fabio Grätz <[email protected]>
Co-authored-by: Yee Hing Tong <[email protected]>
Co-authored-by: Victor Delépine <[email protected]>
Co-authored-by: Future-Outlier <[email protected]>
Co-authored-by: Future Outlier <[email protected]>
Co-authored-by: Yi Chiu <[email protected]>
Co-authored-by: Matthew Hoffman <[email protected]>
Co-authored-by: Kevin Su <[email protected]>
Co-authored-by: Niels Bantilan <[email protected]>
Co-authored-by: Yue Shang <[email protected]>
Co-authored-by: Eduardo Apolinario <[email protected]>
Co-authored-by: Ketan Umare <[email protected]>
Co-authored-by: Keqiu Hu <[email protected]>
Co-authored-by: Jan Fiedler <[email protected]>
Co-authored-by: Chao-Heng Lee <[email protected]>
Co-authored-by: Samhita Alla <[email protected]>
Co-authored-by: Arthur Böök <[email protected]>
Co-authored-by: Katrina Rogan <[email protected]>
Co-authored-by: Po Han(Hank) Huang <[email protected]>
Co-authored-by: Edwin Yu <[email protected]>
Co-authored-by: Fabio M. Graetz, Ph.D <[email protected]>
Co-authored-by: Fabio Grätz <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.