Skip to content

Commit

Permalink
Backport 1.4 onto v1.2 release branch (#1542)
Browse files Browse the repository at this point in the history
* Create non-root user after apt-get (#1519)

* Create non-root user after apt-get

Signed-off-by: Eduardo Apolinario <[email protected]>

* Create user after pip install

Signed-off-by: Kevin Su <[email protected]>

---------

Signed-off-by: Eduardo Apolinario <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Co-authored-by: Eduardo Apolinario <[email protected]>
Co-authored-by: Kevin Su <[email protected]>

* Add root pyflyte reference to docs (#1520)

Signed-off-by: Eduardo Apolinario <[email protected]>
Co-authored-by: Eduardo Apolinario <[email protected]>

* DuckDB plugin (#1419)

* DuckDB integration

Signed-off-by: Samhita Alla <[email protected]>

* add sd test and fix import

Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: Samhita Alla <[email protected]>

* fix lint error

Signed-off-by: Samhita Alla <[email protected]>

* fix lint error

Signed-off-by: Samhita Alla <[email protected]>

* list to List

Signed-off-by: Samhita Alla <[email protected]>

* lint

Signed-off-by: Samhita Alla <[email protected]>

* incorporated suggestions

Signed-off-by: Samhita Alla <[email protected]>

* add duckdb to requirements and add gh action to detect doc warnings and errors

Signed-off-by: Samhita Alla <[email protected]>

* gh action: python 3.9

Signed-off-by: Samhita Alla <[email protected]>

* docs python 3.8 to 3.9

Signed-off-by: Samhita Alla <[email protected]>

---------

Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: Samhita Alla <[email protected]>
Co-authored-by: Kevin Su <[email protected]>

* add string as a valid input (#1527)

* add string as a valid input

Signed-off-by: Samhita Alla <[email protected]>

* isort

Signed-off-by: Samhita Alla <[email protected]>

* tests

Signed-off-by: Samhita Alla <[email protected]>

* Lint

Signed-off-by: Eduardo Apolinario <[email protected]>

---------

Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>
Co-authored-by: Eduardo Apolinario <[email protected]>

* Add back attempt to use existing serialization settings when running (#1529)

Signed-off-by: Yee Hing Tong <[email protected]>

* update configuration docs, fix some docstrings (#1530)

* update configuration docs, fix some docstrings

Signed-off-by: Niels Bantilan <[email protected]>

* update copy

Signed-off-by: Niels Bantilan <[email protected]>

* add config init command

Signed-off-by: Niels Bantilan <[email protected]>

---------

Signed-off-by: Niels Bantilan <[email protected]>

* Revert "Make flytekit comply with PEP-561 (#1516)" (#1532)

This reverts commit b3ad158.

Signed-off-by: Eduardo Apolinario <[email protected]>
Co-authored-by: Eduardo Apolinario <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>

* Failed to initialize FlyteInvalidInputException (#1534)

Signed-off-by: Kevin Su <[email protected]>

* cherry pick pin fsspec commit

Signed-off-by: Yee Hing Tong <[email protected]>

* Set flytekit<1.3.0 in duckdb tests

Signed-off-by: eduardo apolinario <[email protected]>

* Fix flyteidl==1.2.9 in doc-requirements.txt

Signed-off-by: eduardo apolinario <[email protected]>

* No duckdb documentation

Signed-off-by: eduardo apolinario <[email protected]>

* Linting

Signed-off-by: eduardo apolinario <[email protected]>

---------

Signed-off-by: Eduardo Apolinario <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Niels Bantilan <[email protected]>
Signed-off-by: eduardo apolinario <[email protected]>
Co-authored-by: Eduardo Apolinario <[email protected]>
Co-authored-by: Eduardo Apolinario <[email protected]>
Co-authored-by: Kevin Su <[email protected]>
Co-authored-by: Samhita Alla <[email protected]>
Co-authored-by: Niels Bantilan <[email protected]>
  • Loading branch information
6 people authored Mar 8, 2023
1 parent bacbbf8 commit 074262b
Show file tree
Hide file tree
Showing 26 changed files with 737 additions and 65 deletions.
26 changes: 26 additions & 0 deletions .github/workflows/docs_build.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
name: Docs Build

on:
push:
branches:
- master
pull_request:
branches:
- master
jobs:
docs_warnings:
name: Docs Warnings
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
with:
fetch-depth: "0"
- uses: actions/setup-python@v4
with:
python-version: '3.9'
- name: Report Sphinx Warnings
id: sphinx-warnings
run: |
sudo apt-get install python3-sphinx
pip install -r doc-requirements.txt
SPHINXOPTS="-W" cd docs && make html
9 changes: 5 additions & 4 deletions .github/workflows/pythonbuild.yml
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ jobs:
- flytekit-dbt
- flytekit-deck-standard
- flytekit-dolt
- flytekit-duckdb
- flytekit-greatexpectations
- flytekit-hive
- flytekit-k8s-pod
Expand Down Expand Up @@ -160,11 +161,11 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Fetch the code
uses: actions/checkout@v2
- name: Set up Python 3.8
uses: actions/setup-python@v2
uses: actions/checkout@v3
- name: Set up Python 3.9
uses: actions/setup-python@v4
with:
python-version: 3.8
python-version: 3.9
- name: Install dependencies
run: |
python -m pip install --upgrade pip==21.2.4 setuptools wheel
Expand Down
8 changes: 4 additions & 4 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,6 @@ FROM python:${PYTHON_VERSION}-slim-buster
MAINTAINER Flyte Team <[email protected]>
LABEL org.opencontainers.image.source https://github.com/flyteorg/flytekit

RUN useradd -u 1000 flytekit
RUN chown flytekit: /root
USER flytekit

WORKDIR /root
ENV PYTHONPATH /root

Expand All @@ -24,4 +20,8 @@ RUN pip install -U flytekit==$VERSION \
flytekitplugins-data-fsspec[gcp]==$VERSION \
scikit-learn

RUN useradd -u 1000 flytekit
RUN chown flytekit: /root
USER flytekit

ENV FLYTE_INTERNAL_IMAGE "$DOCKER_IMAGE"
2 changes: 1 addition & 1 deletion doc-requirements.in
Original file line number Diff line number Diff line change
Expand Up @@ -45,4 +45,4 @@ whylabs-client # whylogs
ray # ray
scikit-learn # scikit-learn
vaex # vaex
mlflow # mlflow
mlflow # mlflow
1 change: 0 additions & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,6 @@
"sphinx.ext.graphviz",
"sphinx-prompt",
"sphinx_copybutton",
"sphinx_fontawesome",
"sphinx_panels",
"sphinxcontrib.yt",
"sphinx_tags",
Expand Down
2 changes: 2 additions & 0 deletions docs/source/design/control_plane.rst
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,8 @@ The ``for_endpoint`` method also accepts:
* ``data_config``: can be used to configure how data is downloaded or uploaded to a specific blob storage like S3, GCS, etc.
* ``config_file``: the path to the configuration file to use.

.. _general_initialization:

Generalized Initialization
==========================

Expand Down
5 changes: 3 additions & 2 deletions docs/source/extras.tensorflow.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
############
###############
TensorFlow Type
############
###############

.. automodule:: flytekit.extras.tensorflow
:no-members:
:no-inherited-members:
Expand Down
12 changes: 12 additions & 0 deletions docs/source/plugins/duckdb.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
.. _duckdb:

###################################################
DuckDB API reference
###################################################

.. tags:: Integration, Data, Analytics

.. automodule:: flytekitplugins.duckdb
:no-members:
:no-inherited-members:
:no-special-members:
2 changes: 2 additions & 0 deletions docs/source/plugins/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ Plugin API reference
* :ref:`DBT <dbt>` - DBT API reference
* :ref:`Vaex <vaex>` - Vaex API reference
* :ref:`MLflow <mlflow>` - MLflow API reference
* :ref:`DuckDB <duckdb>` - DuckDB API reference

.. toctree::
:maxdepth: 2
Expand Down Expand Up @@ -61,3 +62,4 @@ Plugin API reference
DBT <dbt>
Vaex <vaex>
MLflow <mlflow>
DuckDB <duckdb>
24 changes: 2 additions & 22 deletions docs/source/pyflyte.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,26 +2,6 @@
Pyflyte CLI
###########

.. click:: flytekit.clis.sdk_in_container.init:init
:prog: pyflyte init
:nested: full

.. click:: flytekit.clis.sdk_in_container.local_cache:local_cache
:prog: pyflyte local-cache
:nested: full

.. click:: flytekit.clis.sdk_in_container.package:package
:prog: pyflyte package
:nested: full

.. click:: flytekit.clis.sdk_in_container.register:register
:prog: pyflyte register
:nested: full

.. click:: flytekit.clis.sdk_in_container.run:run
:prog: pyflyte run
:nested: none

.. click:: flytekit.clis.sdk_in_container.serialize:serialize
:prog: pyflyte serialize
.. click:: flytekit.clis.sdk_in_container.pyflyte:main
:prog: pyflyte
:nested: full
4 changes: 2 additions & 2 deletions flytekit/bin/entrypoint.py
Original file line number Diff line number Diff line change
Expand Up @@ -270,8 +270,8 @@ def setup_execution(
if compressed_serialization_settings:
ss = SerializationSettings.from_transport(compressed_serialization_settings)
ssb = ss.new_builder()
ssb.project = exe_project
ssb.domain = exe_domain
ssb.project = ssb.project or exe_project
ssb.domain = ssb.domain or exe_domain
ssb.version = tk_version
if dynamic_addl_distro:
ssb.fast_serialization_settings = FastSerializationSettings(
Expand Down
103 changes: 78 additions & 25 deletions flytekit/configuration/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,28 +5,72 @@
.. currentmodule:: flytekit.configuration
Flytekit Configuration Ecosystem
--------------------------------
Flytekit Configuration Sources
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Where can configuration come from?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
There are multiple ways to configure flytekit settings:
- Command line arguments. This is the ideal location for settings to go. (See ``pyflyte package --help`` for example.)
- Environment variables. Users can specify these at compile time, but when your task is run, Flyte Propeller will also set configuration to ensure correct interaction with the platform.
- A config file - an INI style configuration file. By default, flytekit will look for a file in two places
1. First, a file named ``flytekit.config`` in the Python interpreter's starting directory
2. A file in ``~/.flyte/config`` in the home directory as detected by Python.
**Command Line Arguments**: This is the recommended way of setting configuration values for many cases.
For example, see `pyflyte package <pyflyte.html#pyflyte-package>`_ command.
**Python Config Object**: A :py:class:`~flytekit.configuration.Config` object can by used directly, e.g. when
initializing a :py:class:`~flytefit.remote.remote.FlyteRemote` object. See :doc:`here <design/control_plane>` for examples on
how to specify a ``Config`` object.
**Environment Variables**: Users can specify these at compile time, but when your task is run, Flyte Propeller will
also set configuration to ensure correct interaction with the platform. The environment variables must be specified
with the format ``FLYTE_{SECTION}_{OPTION}``, all in upper case. For example, to specify the
:py:class:`PlatformConfig.endpoint <flytekit.configuration.PlatformConfig>` setting, the environment variable would
be ``FLYTE_PLATFORM_URL``.
.. note::
Environment variables won't work for image configuration, which need to be specified with the
`pyflyte package --image ... <pyflyte.html#cmdoption-pyflyte-package-i>`_ option or in a configuration
file.
**YAML Format Configuration File**: A configuration file that contains settings for both
`flytectl <https://docs.flyte.org/projects/flytectl/>`__ and ``flytekit``. This is the recommended configuration
file format. Invoke the :ref:`flytectl config init <flytectl_config_init>` command to create a boilerplate
``~/.flyte/config.yaml`` file, and ``flytectl --help`` to learn about all of the configuration yaml options.
.. dropdown:: See example ``config.yaml`` file
:title: text-muted
:animate: fade-in-slide-down
.. literalinclude:: ../../tests/flytekit/unit/configuration/configs/sample.yaml
:language: yaml
:caption: config.yaml
**INI Format Configuration File**: A configuration file for ``flytekit``. By default, ``flytekit`` will look for a
file in two places:
1. First, a file named ``flytekit.config`` in the Python interpreter's working directory.
2. A file in ``~/.flyte/config`` in the home directory as detected by Python.
.. dropdown:: See example ``flytekit.config`` file
:title: text-muted
:animate: fade-in-slide-down
.. literalinclude:: ../../tests/flytekit/unit/configuration/configs/images.config
:language: ini
:caption: flytekit.config
.. warning::
The INI format configuration is considered a legacy configuration format. We recommend using the yaml format
instead if you're using a configuration file.
How is configuration used?
^^^^^^^^^^^^^^^^^^^^^^^^^^
Configuration usage can roughly be bucketed into the following areas,
- Compile-time settings - things like the default image, where to look for Flyte code, etc.
- Platform settings - Where to find the Flyte backend (Admin DNS, whether to use SSL)
- Run time (registration) settings - these are things like the K8s service account to use, a specific S3/GCS bucket to write off-loaded data (dataframes and files) to, notifications, labels & annotations, etc.
- Data access settings - Is there a custom S3 endpoint in use? Backoff/retry behavior for accessing S3/GCS, key and password, etc.
- Other settings - Statsd configuration, which is a run-time applicable setting but is not necessarily relevant to the Flyte platform.
- **Compile-time settings**: these are settings like the default image and named images, where to look for Flyte code, etc.
- **Platform settings**: Where to find the Flyte backend (Admin DNS, whether to use SSL)
- **Registration Run-time settings**: these are things like the K8s service account to use, a specific S3/GCS bucket to write off-loaded data (dataframes and files) to, notifications, labels & annotations, etc.
- **Data access settings**: Is there a custom S3 endpoint in use? Backoff/retry behavior for accessing S3/GCS, key and password, etc.
- **Other settings** - Statsd configuration, which is a run-time applicable setting but is not necessarily relevant to the Flyte platform.
Configuration Objects
---------------------
Expand All @@ -42,8 +86,15 @@
.. _configuration-compile-time-settings:
Compilation (Serialization) Time Settings
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Serialization Time Settings
^^^^^^^^^^^^^^^^^^^^^^^^^^^
These are serialization/compile-time settings that are used when using commands like
`pyflyte package <pyflyte.html#pyflyte-package>`_ or `pyflyte register <pyflyte.html#pyflyte-register>`_. These
configuration settings are typically passed in as flags to the above CLI commands.
The image configurations are typically either passed in via an `--image <pyflyte.html#cmdoption-pyflyte-package-i>`_ flag,
or can be specified in the ``yaml`` or ``ini`` configuration files (see examples above).
.. autosummary::
:template: custom.rst
Expand All @@ -60,6 +111,10 @@
Execution Time Settings
^^^^^^^^^^^^^^^^^^^^^^^
Users typically shouldn't be concerned with these configurations, as they are typically set by FlytePropeller or
FlyteAdmin. The configurations below are useful for authenticating to a Flyte backend, configuring data access
credentials, secrets, and statsd metrics.
.. autosummary::
:template: custom.rst
:toctree: generated/
Expand All @@ -71,7 +126,6 @@
~S3Config
~GCSConfig
~DataConfig
~Config
"""
from __future__ import annotations
Expand Down Expand Up @@ -190,10 +244,9 @@ def find_image(self, name) -> Optional[Image]:
def validate_image(_: typing.Any, param: str, values: tuple) -> ImageConfig:
"""
Validates the image to match the standard format. Also validates that only one default image
is provided. a default image, is one that is specified as
default=img or just img. All other images should be provided with a name, in the format
name=img
This method can be used with the CLI
is provided. a default image, is one that is specified as ``default=<image_uri>`` or just ``<image_uri>``. All
other images should be provided with a name, in the format ``name=<image_uri>`` This method can be used with the
CLI
:param _: click argument, ignored here.
:param param: the click argument, here should be "image"
Expand Down Expand Up @@ -266,7 +319,8 @@ def from_images(cls, default_image: str, m: typing.Optional[typing.Dict[str, str
{
"spark": "ghcr.io/flyteorg/myspark:...",
"other": "...",
})
}
)
:return:
"""
Expand Down Expand Up @@ -557,7 +611,7 @@ def auto(cls, config_file: typing.Union[str, ConfigFile, None] = None) -> Config
@classmethod
def for_sandbox(cls) -> Config:
"""
Constructs a new Config object specifically to connect to :std:ref:`deploy-sandbox-local`.
Constructs a new Config object specifically to connect to :std:ref:`deployment-deployment-sandbox`.
If you are using a hosted Sandbox like environment, then you may need to use port-forward or ingress urls
:return: Config
"""
Expand Down Expand Up @@ -619,15 +673,14 @@ class FastSerializationSettings(object):
distribution_location: Optional[str] = None


# TODO: ImageConfig, python_interpreter, venv_root, fast_serialization_settings.destination_dir should be combined.
@dataclass_json
@dataclass()
class SerializationSettings(object):
"""
These settings are provided while serializing a workflow and task, before registration. This is required to get
runtime information at serialization time, as well as some defaults.
TODO: ImageConfig, python_interpreter, venv_root, fast_serialization_settings.destination_dir should be combined.
Attributes:
project (str): The project (if any) with which to register entities under.
domain (str): The domain (if any) with which to register entities under.
Expand Down
2 changes: 1 addition & 1 deletion flytekit/exceptions/user.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,4 +93,4 @@ class FlyteInvalidInputException(FlyteUserException):

def __init__(self, request: typing.Any):
self.request = request
super(self).__init__()
super().__init__()
3 changes: 2 additions & 1 deletion flytekit/extras/tasks/shell.py
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,8 @@ def __init__(
task_config: T Configuration for the task, can be either a Pod (or coming soon, BatchJob) config
inputs: A Dictionary of input names to types
output_locs: A list of :py:class:`OutputLocations`
**kwargs: Other arguments that can be passed to :ref:class:`PythonInstanceTask`
**kwargs: Other arguments that can be passed to
:py:class:`~flytekit.core.python_function_task.PythonInstanceTask`
"""
if script and script_file:
raise ValueError("Only either of script or script_file can be provided")
Expand Down
2 changes: 2 additions & 0 deletions plugins/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@ All the Flytekit plugins maintained by the core team are added here. It is not n
| Snowflake | ```bash pip install flytekitplugins-snowflake``` | Use Snowflake as a 'data warehouse-as-a-service' within Flyte | [![PyPI version fury.io](https://badge.fury.io/py/flytekitplugins-snowflake.svg)](https://pypi.python.org/pypi/flytekitplugins-snowflake/) | Backend |
| dbt | ```bash pip install flytekitplugins-dbt``` | Run dbt within Flyte | [![PyPI version fury.io](https://badge.fury.io/py/flytekitplugins-dbt.svg)](https://pypi.python.org/pypi/flytekitplugins-dbt/) | Flytekit-only |
| Huggingface | ```bash pip install flytekitplugins-huggingface``` | Read & write Hugginface Datasets as Flyte StructuredDatasets | [![PyPI version fury.io](https://badge.fury.io/py/flytekitplugins-huggingface.svg)](https://pypi.python.org/pypi/flytekitplugins-huggingface/) | Flytekit-only |
| DuckDB | ```bash pip install flytekitplugins-duckdb``` | Run analytical workloads with ease using DuckDB.
| [![PyPI version fury.io](https://badge.fury.io/py/flytekitplugins-duckdb.svg)](https://pypi.python.org/pypi/flytekitplugins-duckdb/) | Flytekit-only |

## Have a Plugin Idea? 💡
Please [file an issue](https://github.com/flyteorg/flyte/issues/new?assignees=&labels=untriaged%2Cplugins&template=backend-plugin-request.md&title=%5BPlugin%5D).
Expand Down
2 changes: 1 addition & 1 deletion plugins/flytekit-data-fsspec/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

microlib_name = f"flytekitplugins-data-{PLUGIN_NAME}"

plugin_requires = ["flytekit>=1.1.0b0,<1.3.0,<2.0.0", "fsspec>=2021.7.0", "botocore>=1.7.48", "pandas>=1.2.0"]
plugin_requires = ["flytekit>=1.1.0b0,<1.3.0,<2.0.0", "fsspec<=2023.1", "botocore>=1.7.48", "pandas>=1.2.0"]

__version__ = "0.0.0+develop"

Expand Down
9 changes: 9 additions & 0 deletions plugins/flytekit-duckdb/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Flytekit DuckDB Plugin

Run analytical workloads with ease using DuckDB.

To install the plugin, run the following command:

```bash
pip install flytekitplugins-duckdb
```
11 changes: 11 additions & 0 deletions plugins/flytekit-duckdb/flytekitplugins/duckdb/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
"""
.. currentmodule:: flytekitplugins.duckdb
.. autosummary::
:template: custom.rst
:toctree: generated/
DuckDBQuery
"""

from .task import DuckDBQuery
Loading

0 comments on commit 074262b

Please sign in to comment.