Skip to content

Latest commit

 

History

History
2079 lines (1220 loc) · 63.7 KB

CONTRIBUTORS_QUICK_START.rst

File metadata and controls

2079 lines (1220 loc) · 63.7 KB

Contributor's Quick Guide

There are two ways you can run the Airflow dev env on your machine:
  1. With a Docker Container
  2. With a local virtual environment

Before deciding which method to choose, there are a couple factors to consider: Running Airflow in a container is the most reliable way: it provides a more consistent environment and allows integration tests with a number of integrations (cassandra, mongo, mysql, etc.). However it also requires 4GB RAM, 40GB disk space and at least 2 cores. If you are working on a basic feature, installing Airflow on a local environment might be sufficient.

  1. Docker Community Edition
  2. Docker Compose
  3. pyenv (you can also use pyenv-virtualenv or virtualenvwrapper)
  4. jq
  1. Installing required packages for Docker and setting up docker repo
$ sudo apt-get update

$ sudo apt-get install \
    apt-transport-https \
    ca-certificates \
    curl \
    gnupg-agent \
    software-properties-common

$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -

$ sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"
  1. Install Docker
$ sudo apt-get update
$ sudo apt-get install docker-ce docker-ce-cli containerd.io
  1. Creating group for docker and adding current user to it.
$ sudo groupadd docker
$ sudo usermod -aG docker $USER

Note : After adding user to docker group Logout and Login again for group membership re-evaluation.

  1. Test Docker installation
$ docker run hello-world
  1. Installing latest version of Docker Compose
$ COMPOSE_VERSION="$(curl -s https://api.github.com/repos/docker/compose/releases/latest | grep '"tag_name":'\
| cut -d '"' -f 4)"

$ COMPOSE_URL="https://github.com/docker/compose/releases/download/${COMPOSE_VERSION}/\
docker-compose-$(uname -s)-$(uname -m)"

$ sudo curl -L "${COMPOSE_URL}" -o /usr/local/bin/docker-compose

$ sudo chmod +x /usr/local/bin/docker-compose
  1. Verifying installation
$ docker-compose --version
  1. Install pyenv and configure your shell's environment for Pyenv as suggested in Pyenv README
  2. After installing pyenv, you need to install a few more required packages for Airflow
$ sudo apt-get install openssl sqlite default-libmysqlclient-dev libmysqlclient-dev postgresql
  1. Restart your shell so the path changes take effect and verifying installation
$ exec $SHELL
$ pyenv --version
  1. Checking available version, installing required Python version to pyenv and verifying it
$ pyenv install --list
$ pyenv install 3.8.5
$ pyenv versions
  1. Creating new virtual environment named airflow-env for installed version python. In next chapter virtual environment airflow-env will be used for installing airflow.
$ pyenv virtualenv 3.8.5 airflow-env
  1. Entering virtual environment airflow-env
$ pyenv activate airflow-env

jq is a lightweight and flexible command-line JSON processor.

Install jq with the following command:

$ sudo apt install jq
Setup and develop using PyCharm

Note

Only pip installation is currently officially supported.

While they are some successes with using other tools like poetry or pip-tools, they do not share the same workflow as pip - especially when it comes to constraint vs. requirements management. Installing via Poetry or pip-tools is not currently supported.

If you wish to install airflow using those tools you should use the constraint files and convert them to appropriate format and workflow that your tool requires.

  1. Goto https://github.com/apache/airflow/ and fork the project.

    Forking Apache Airflow project
  2. Goto your github account's fork of airflow click on Code and copy the clone link.

    Cloning github fork of Apache airflow
  3. Open your IDE or source code editor and select the option to clone the repository

    Cloning github fork to Pycharm
  4. Paste the copied clone link in the URL field and submit.

    Cloning github fork to Pycharm
  1. Open terminal and enter into virtual environment airflow-env and goto project directory
$ pyenv activate airflow-env
$ cd ~/Projects/airflow/
  1. Initializing breeze autocomplete
$ ./breeze setup-autocomplete
$ source ~/.bash_completion.d/breeze-complete
  1. Initialize breeze environment with required python version and backend. This may take a while for first time.
$ ./breeze --python 3.8 --backend mysql

Note

If you encounter an error like "docker.credentials.errors.InitializationError: docker-credential-secretservice not installed or not available in PATH", you may execute the following command to fix it:

$ sudo apt install golang-docker-credential-helper

Once the package is installed, execute the breeze command again to resume image building.

  1. Once the breeze environment is initialized, create airflow tables and users from the breeze CLI. airflow db reset is required to execute at least once for Airflow Breeze to get the database/tables created.
root@b76fcb399bb6:/opt/airflow# airflow db reset
root@b76fcb399bb6:/opt/airflow# airflow users create --role Admin --username admin --password admin \
  --email [email protected] --firstname foo --lastname bar
  1. Closing Breeze environment. After successfully finishing above command will leave you in container, type exit to exit the container
root@b76fcb399bb6:/opt/airflow#
root@b76fcb399bb6:/opt/airflow# exit
$ ./breeze stop
  1. It may require some packages to be installed; watch the output of the command to see which ones are missing.
$ sudo apt-get install sqlite libsqlite3-dev default-libmysqlclient-dev postgresql
$ ./breeze initialize-local-virtualenv --python 3.8
  1. Add following line to ~/.bashrc in order to call breeze command from anywhere.
export PATH=${PATH}:"/home/${USER}/Projects/airflow"
source ~/.bashrc
  1. Starting breeze environment using breeze start-airflow starts Breeze environment with last configuration run( In this case python and backend will be picked up from last execution ./breeze --python 3.8 --backend mysql) It also automatically starts webserver, backend and scheduler. It drops you in tmux with scheduler in bottom left and webserver in bottom right. Use [Ctrl + B] and Arrow keys to navigate.
$ breeze start-airflow

    Use CI image.

 Branch name:            main
 Docker image:           apache/airflow:main-python3.8-ci
 Airflow source version: 2.0.0b2
 Python version:         3.8
 Backend:                mysql 5.7


 Port forwarding:

 Ports are forwarded to the running docker containers for webserver and database
   * 28080 -> forwarded to Airflow webserver -> airflow:8080
   * 25555 -> forwarded to Flower dashboard -> airflow:5555
   * 25433 -> forwarded to Postgres database -> postgres:5432
   * 23306 -> forwarded to MySQL database  -> mysql:3306
   * 26379 -> forwarded to Redis broker -> redis:6379

 Here are links to those services that you can use on host:
   * Webserver: http://127.0.0.1:28080
   * Flower:    http://127.0.0.1:25555
   * Postgres:  jdbc:postgresql://127.0.0.1:25433/airflow?user=postgres&password=airflow
   * Mysql:     jdbc:mysql://127.0.0.1:23306/airflow?user=root
   * Redis:     redis://127.0.0.1:26379/0
Accessing local airflow
  • Alternatively you can start the same using following commands

    1. Start Breeze
    $ breeze --python 3.8 --backend mysql
    1. Open tmux
    $ root@0c6e4ff0ab3d:/opt/airflow# tmux
    1. Press Ctrl + B and "
    $ root@0c6e4ff0ab3d:/opt/airflow# airflow scheduler
    1. Press Ctrl + B and %
    $ root@0c6e4ff0ab3d:/opt/airflow# airflow webserver
  1. Now you can access airflow web interface on your local machine at http://127.0.0.1:28080 with user name admin and password admin.

    Accessing local airflow
  2. Setup mysql database in MySQL Workbench with Host 127.0.0.1, port 23306, user root and password blank(leave empty), default schema airflow.

    Connecting to mysql
  3. Stopping breeze

root@f3619b74c59a:/opt/airflow# stop_airflow
root@f3619b74c59a:/opt/airflow# exit
$ breeze stop
  1. Knowing more about Breeze
$ breeze --help

For more information visit : Breeze documentation

Following are some of important topics of Breeze documentation:

  1. Configuring Airflow database connection
  • Airflow is by default configured to use SQLite database. Configuration can be seen on local machine ~/airflow/airflow.cfg under sql_alchemy_conn.

  • Installing required dependency for MySQL connection in airflow-env on local machine.

    $ pyenv activate airflow-env
    $ pip install PyMySQL
  • Now set sql_alchemy_conn = mysql+pymysql://root:@127.0.0.1:23306/airflow?charset=utf8mb4 in file ~/airflow/airflow.cfg on local machine.

  1. Debugging an example DAG
  • Add Interpreter to PyCharm pointing interpreter path to ~/.pyenv/versions/airflow-env/bin/python, which is virtual environment airflow-env created with pyenv earlier. For adding an Interpreter go to File -> Setting -> Project: airflow -> Python Interpreter.

    Adding existing interpreter
  • In PyCharm IDE open airflow project, directory /files/dags of local machine is by default mounted to docker machine when breeze airflow is started. So any DAG file present in this directory will be picked automatically by scheduler running in docker machine and same can be seen on http://127.0.0.1:28080.

  • Copy any example DAG present in the /airflow/example_dags directory to /files/dags/.

  • Add a __main__ block at the end of your DAG file to make it runnable. It will run a back_fill job:

    from airflow.utils.state import State
    
    ...
    
    if __name__ == "__main__":
        dag.clear(dag_run_state=State.NONE)
        dag.run()
  • Add AIRFLOW__CORE__EXECUTOR=DebugExecutor to Environment variable of Run Configuration.

    • Click on Add configuration

      Add Configuration pycharm
    • Add Script Path and Environment Variable to new Python configuration

      Add environment variable pycharm
  • Now Debug an example dag and view the entries in tables such as dag_run, xcom etc in MySQL Workbench.

  1. Click on the branch symbol in the status bar

    Creating a new branch
  2. Give a name to a branch and checkout

    Giving a name to a branch

All Tests are inside ./tests directory.

  • Running Unit tests inside Breeze environment.

    Just run pytest filepath+filename to run the tests.

root@63528318c8b1:/opt/airflow# pytest tests/utils/test_decorators.py
======================================= test session starts =======================================
platform linux -- Python 3.8.6, pytest-6.0.1, py-1.9.0, pluggy-0.13.1 -- /usr/local/bin/python
cachedir: .pytest_cache
rootdir: /opt/airflow, configfile: pytest.ini
plugins: celery-4.4.7, requests-mock-1.8.0, xdist-1.34.0, flaky-3.7.0, rerunfailures-9.0, instafail
-0.4.2, forked-1.3.0, timeouts-1.2.1, cov-2.10.0
setup timeout: 0.0s, execution timeout: 0.0s, teardown timeout: 0.0s
collected 3 items

tests/utils/test_decorators.py::TestApplyDefault::test_apply PASSED                         [ 33%]
tests/utils/test_decorators.py::TestApplyDefault::test_default_args PASSED                  [ 66%]
tests/utils/test_decorators.py::TestApplyDefault::test_incorrect_default_args PASSED        [100%]

======================================== 3 passed in 1.49s ========================================
  • Running All the test with Breeze by specifying required python version, backend, backend version
$ breeze --backend mysql --mysql-version 5.7 --python 3.8 --db-reset --test-type All  tests
  • Running specific test in container using shell scripts. Testing in container scripts are located in ./scripts/in_container directory.
root@df8927308887:/opt/airflow# ./scripts/in_container/
   bin/                                        run_flake8.sh*
   check_environment.sh*                       run_generate_constraints.sh*
   entrypoint_ci.sh*                           run_init_script.sh*
   entrypoint_exec.sh*                         run_install_and_test_provider_packages.sh*
   _in_container_script_init.sh*               run_mypy.sh*
   prod/                                       run_prepare_provider_packages.sh*
   run_ci_tests.sh*                            run_prepare_provider_documentation.sh*
   run_clear_tmp.sh*                           run_system_tests.sh*
   run_docs_build.sh*                          run_tmux_welcome.sh*
   run_extract_tests.sh*                       stop_tmux_airflow.sh*
   run_fix_ownership.sh*                       update_quarantined_test_status.py*

root@df8927308887:/opt/airflow# ./scripts/in_container/run_docs_build.sh
  • Running specific type of test

    • Types of tests
    • Running specific type of test
    $ breeze --backend mysql --mysql-version 5.7 --python 3.8 --db-reset --test-type Core
  • Running Integration test for specific test type

    • Running an Integration Test
    $ breeze --backend mysql --mysql-version 5.7 --python 3.8 --db-reset --test-type All --integration mongo
  • For more information on Testing visit : TESTING.rst

Before committing changes to github or raising a pull request, code needs to be checked for certain quality standards such as spell check, code syntax, code formatting, compatibility with Apache License requirements etc. This set of tests are applied when you commit your code.

CI tests GitHub

To avoid burden on CI infrastructure and to save time, Pre-commit hooks can be run locally before committing changes.

  1. Installing required packages
$ sudo apt install libxml2-utils
  1. Installing required Python packages
$ pyenv activate airflow-env
$ pip install pre-commit
  1. Go to your project directory
$ cd ~/Projects/airflow
  1. Running pre-commit hooks
$ pre-commit run --all-files
  No-tabs checker......................................................Passed
  Add license for all SQL files........................................Passed
  Add license for all other files......................................Passed
  Add license for all rst files........................................Passed
  Add license for all JS/CSS/PUML files................................Passed
  Add license for all JINJA template files.............................Passed
  Add license for all shell files......................................Passed
  Add license for all python files.....................................Passed
  Add license for all XML files........................................Passed
  Add license for all yaml files.......................................Passed
  Add license for all md files.........................................Passed
  Add license for all mermaid files....................................Passed
  Add TOC for md files.................................................Passed
  Add TOC for upgrade documentation....................................Passed
  Check hooks apply to the repository..................................Passed
  black................................................................Passed
  Check for merge conflicts............................................Passed
  Debug Statements (Python)............................................Passed
  Check builtin type constructor use...................................Passed
  Detect Private Key...................................................Passed
  Fix End of Files.....................................................Passed
  ...........................................................................
  1. Running pre-commit for selected files
$ pre-commit run  --files airflow/decorators.py tests/utils/test_task_group.py
  1. Running specific hook for selected files
$ pre-commit run black --files airflow/decorators.py tests/utils/test_task_group.py
  black...............................................................Passed
$ pre-commit run flake8 --files airflow/decorators.py tests/utils/test_task_group.py
  Run flake8..........................................................Passed
  1. Running specific checks in container using shell scripts. Scripts are located in ./scripts/in_container directory.
root@df8927308887:/opt/airflow# ./scripts/in_container/
   bin/                                        run_flake8.sh*
   check_environment.sh*                       run_generate_constraints.sh*
   entrypoint_ci.sh*                           run_init_script.sh*
   entrypoint_exec.sh*                         run_install_and_test_provider_packages.sh*
   _in_container_script_init.sh*               run_mypy.sh*
   prod/                                       run_prepare_provider_packages.sh*
   run_ci_tests.sh*                            run_prepare_provider_documentation.sh*
   run_clear_tmp.sh*                           run_system_tests.sh*
   run_docs_build.sh*                          run_tmux_welcome.sh*
   run_extract_tests.sh*                       stop_tmux_airflow.sh*
   run_fix_ownership.sh*                       update_quarantined_test_status.py*


root@df8927308887:/opt/airflow# ./scripts/in_container/run_docs_build.sh
  1. Enabling Pre-commit check before push. It will run pre-commit automatically before committing and stops the commit
$ cd ~/Projects/airflow
$ pre-commit install
$ git commit -m "Added xyz"
  1. To disable Pre-commit
$ cd ~/Projects/airflow
$ pre-commit uninstall
  1. Go to your GitHub account and open your fork project and click on Branches

    Goto fork and select branches
  2. Click on New pull request button on branch from which you want to raise a pull request.

    Accessing local airflow
  3. Add title and description as per Contributing guidelines and click on Create pull request.

    Accessing local airflow

Often it takes several days or weeks to discuss and iterate with the PR until it is ready to merge. In the meantime new commits are merged, and you might run into conflicts, therefore you should periodically synchronize main in your fork with the apache/airflow main and rebase your PR on top of it. Following describes how to do it.

Setup and develop using Visual Studio Code

Note

Only pip installation is currently officially supported.

While they are some successes with using other tools like poetry or pip-tools, they do not share the same workflow as pip - especially when it comes to constraint vs. requirements management. Installing via Poetry or pip-tools is not currently supported.

If you wish to install airflow using those tools you should use the constraint files and convert them to appropriate format and workflow that your tool requires.

  1. Goto https://github.com/apache/airflow/ and fork the project.

    Forking Apache Airflow project
  2. Goto your github account's fork of airflow click on Code and copy the clone link.

    Cloning github fork of Apache airflow
  3. Open your IDE or source code editor and select the option to clone the repository

    Cloning github fork to Visual Studio Code
  4. Paste the copied clone link in the URL field and submit.

    Cloning github fork to Visual Studio Code
  1. Open terminal and enter into virtual environment airflow-env and goto project directory
$ pyenv activate airflow-env
$ cd ~/Projects/airflow/
  1. Initializing breeze autocomplete
$ ./breeze setup-autocomplete
$ source ~/.bash_completion.d/breeze-complete
  1. Initialize breeze environment with required python version and backend. This may take a while for first time.
$ ./breeze --python 3.8 --backend mysql

Note

If you encounter an error like "docker.credentials.errors.InitializationError: docker-credential-secretservice not installed or not available in PATH", you may execute the following command to fix it:

$ sudo apt install golang-docker-credential-helper

Once the package is installed, execute the breeze command again to resume image building.

  1. Once the breeze environment is initialized, create airflow tables and users from the breeze CLI. airflow db reset is required to execute at least once for Airflow Breeze to get the database/tables created.
root@b76fcb399bb6:/opt/airflow# airflow db reset
root@b76fcb399bb6:/opt/airflow# airflow users create --role Admin --username admin --password admin \
  --email [email protected] --firstname foo --lastname bar
  1. Closing Breeze environment. After successfully finishing above command will leave you in container, type exit to exit the container
root@b76fcb399bb6:/opt/airflow#
root@b76fcb399bb6:/opt/airflow# exit
$ ./breeze stop
  1. It may require some packages to be installed; watch the output of the command to see which ones are missing.
$ sudo apt-get install sqlite libsqlite3-dev default-libmysqlclient-dev postgresql
$ ./breeze initialize-local-virtualenv --python 3.8
  1. Add following line to ~/.bashrc in order to call breeze command from anywhere.
export PATH=${PATH}:"/home/${USER}/Projects/airflow"
source ~/.bashrc
  1. Starting breeze environment using breeze start-airflow starts Breeze environment with last configuration run( In this case python and backend will be picked up from last execution ./breeze --python 3.8 --backend mysql) It also automatically starts webserver, backend and scheduler. It drops you in tmux with scheduler in bottom left and webserver in bottom right. Use [Ctrl + B] and Arrow keys to navigate.
$ breeze start-airflow

    Use CI image.

 Branch name:            main
 Docker image:           apache/airflow:main-python3.8-ci
 Airflow source version: 2.0.0b2
 Python version:         3.8
 Backend:                mysql 5.7


 Port forwarding:

 Ports are forwarded to the running docker containers for webserver and database
   * 28080 -> forwarded to Airflow webserver -> airflow:8080
   * 25555 -> forwarded to Flower dashboard -> airflow:5555
   * 25433 -> forwarded to Postgres database -> postgres:5432
   * 23306 -> forwarded to MySQL database  -> mysql:3306
   * 26379 -> forwarded to Redis broker -> redis:6379

 Here are links to those services that you can use on host:
   * Webserver: http://127.0.0.1:28080
   * Flower:    http://127.0.0.1:25555
   * Postgres:  jdbc:postgresql://127.0.0.1:25433/airflow?user=postgres&password=airflow
   * Mysql:     jdbc:mysql://127.0.0.1:23306/airflow?user=root
   * Redis:     redis://127.0.0.1:26379/0
Accessing local airflow
  • Alternatively you can start the same using following commands

    1. Start Breeze
    $ breeze --python 3.8 --backend mysql
    1. Open tmux
    $ root@0c6e4ff0ab3d:/opt/airflow# tmux
    1. Press Ctrl + B and "
    $ root@0c6e4ff0ab3d:/opt/airflow# airflow scheduler
    1. Press Ctrl + B and %
    $ root@0c6e4ff0ab3d:/opt/airflow# airflow webserver
  1. Now you can access airflow web interface on your local machine at http://127.0.0.1:28080 with user name admin and password admin.

    Accessing local airflow
  2. Setup mysql database in MySQL Workbench with Host 127.0.0.1, port 23306, user root and password blank(leave empty), default schema airflow.

    Connecting to mysql
  3. Stopping breeze

root@f3619b74c59a:/opt/airflow# stop_airflow
root@f3619b74c59a:/opt/airflow# exit
$ breeze stop
  1. Knowing more about Breeze
$ breeze --help

For more information visit : Breeze documentation

Following are some of important topics of Breeze documentation:

  1. Configuring Airflow database connection
  • Airflow is by default configured to use SQLite database. Configuration can be seen on local machine ~/airflow/airflow.cfg under sql_alchemy_conn.

  • Installing required dependency for MySQL connection in airflow-env on local machine.

    $ pyenv activate airflow-env
    $ pip install PyMySQL
  • Now set sql_alchemy_conn = mysql+pymysql://root:@127.0.0.1:23306/airflow?charset=utf8mb4 in file ~/airflow/airflow.cfg on local machine.

  1. Debugging an example DAG
  • In Visual Studio Code open airflow project, directory /files/dags of local machine is by default mounted to docker machine when breeze airflow is started. So any DAG file present in this directory will be picked automatically by scheduler running in docker machine and same can be seen on http://127.0.0.1:28080.

  • Copy any example DAG present in the /airflow/example_dags directory to /files/dags/.

  • Add a __main__ block at the end of your DAG file to make it runnable. It will run a back_fill job:

    from airflow.utils.state import State
    
    ...
    
    if __name__ == "__main__":
        dag.clear(dag_run_state=State.NONE)
        dag.run()
  • Add "AIRFLOW__CORE__EXECUTOR": "DebugExecutor" to the "env" field of Debug configuration.

    • Using the Run view click on Create a launch.json file

      Add Debug Configuration to Visual Studio Code Add Debug Configuration to Visual Studio Code Add Debug Configuration to Visual Studio Code
    • Change "program" to point to an example dag and add "env" and "python" fields to the new Python configuration

      Add environment variable to Visual Studio Code Debug configuration
  • Now Debug an example dag and view the entries in tables such as dag_run, xcom etc in mysql workbench.

  1. Click on the branch symbol in the status bar

    Creating a new branch
  2. Give a name to a branch and checkout

    Giving a name to a branch

All Tests are inside ./tests directory.

  • Running Unit tests inside Breeze environment.

    Just run pytest filepath+filename to run the tests.

root@63528318c8b1:/opt/airflow# pytest tests/utils/test_decorators.py
======================================= test session starts =======================================
platform linux -- Python 3.8.6, pytest-6.0.1, py-1.9.0, pluggy-0.13.1 -- /usr/local/bin/python
cachedir: .pytest_cache
rootdir: /opt/airflow, configfile: pytest.ini
plugins: celery-4.4.7, requests-mock-1.8.0, xdist-1.34.0, flaky-3.7.0, rerunfailures-9.0, instafail
-0.4.2, forked-1.3.0, timeouts-1.2.1, cov-2.10.0
setup timeout: 0.0s, execution timeout: 0.0s, teardown timeout: 0.0s
collected 3 items

tests/utils/test_decorators.py::TestApplyDefault::test_apply PASSED                         [ 33%]
tests/utils/test_decorators.py::TestApplyDefault::test_default_args PASSED                  [ 66%]
tests/utils/test_decorators.py::TestApplyDefault::test_incorrect_default_args PASSED        [100%]

======================================== 3 passed in 1.49s ========================================
  • Running All the test with Breeze by specifying required python version, backend, backend version
$ breeze --backend mysql --mysql-version 5.7 --python 3.8 --db-reset --test-type All  tests
  • Running specific test in container using shell scripts. Testing in container scripts are located in ./scripts/in_container directory.
root@df8927308887:/opt/airflow# ./scripts/in_container/
   bin/                                        run_flake8.sh*
   check_environment.sh*                       run_generate_constraints.sh*
   entrypoint_ci.sh*                           run_init_script.sh*
   entrypoint_exec.sh*                         run_install_and_test_provider_packages.sh*
   _in_container_script_init.sh*               run_mypy.sh*
   prod/                                       run_prepare_provider_packages.sh*
   run_ci_tests.sh*                            run_prepare_provider_documentation.sh*
   run_clear_tmp.sh*                           run_system_tests.sh*
   run_docs_build.sh*                          run_tmux_welcome.sh*
   run_extract_tests.sh*                       stop_tmux_airflow.sh*
   run_fix_ownership.sh*                       update_quarantined_test_status.py*

root@df8927308887:/opt/airflow# ./scripts/in_container/run_docs_build.sh
  • Running specific type of test

    • Types of tests
    • Running specific type of test
    $ breeze --backend mysql --mysql-version 5.7 --python 3.8 --db-reset --test-type Core
  • Running Integration test for specific test type

    • Running an Integration Test
    $ breeze --backend mysql --mysql-version 5.7 --python 3.8 --db-reset --test-type All --integration mongo
  • For more information on Testing visit : TESTING.rst

Before committing changes to github or raising a pull request, code needs to be checked for certain quality standards such as spell check, code syntax, code formatting, compatibility with Apache License requirements etc. This set of tests are applied when you commit your code.

CI tests GitHub

To avoid burden on CI infrastructure and to save time, Pre-commit hooks can be run locally before committing changes.

  1. Installing required packages
$ sudo apt install libxml2-utils
  1. Installing required Python packages
$ pyenv activate airflow-env
$ pip install pre-commit
  1. Go to your project directory
$ cd ~/Projects/airflow
  1. Running pre-commit hooks
$ pre-commit run --all-files
  No-tabs checker......................................................Passed
  Add license for all SQL files........................................Passed
  Add license for all other files......................................Passed
  Add license for all rst files........................................Passed
  Add license for all JS/CSS/PUML files................................Passed
  Add license for all JINJA template files.............................Passed
  Add license for all shell files......................................Passed
  Add license for all python files.....................................Passed
  Add license for all XML files........................................Passed
  Add license for all yaml files.......................................Passed
  Add license for all md files.........................................Passed
  Add license for all mermaid files....................................Passed
  Add TOC for md files.................................................Passed
  Add TOC for upgrade documentation....................................Passed
  Check hooks apply to the repository..................................Passed
  black................................................................Passed
  Check for merge conflicts............................................Passed
  Debug Statements (Python)............................................Passed
  Check builtin type constructor use...................................Passed
  Detect Private Key...................................................Passed
  Fix End of Files.....................................................Passed
  ...........................................................................
  1. Running pre-commit for selected files
$ pre-commit run  --files airflow/decorators.py tests/utils/test_task_group.py
  1. Running specific hook for selected files
$ pre-commit run black --files airflow/decorators.py tests/utils/test_task_group.py
  black...............................................................Passed
$ pre-commit run flake8 --files airflow/decorators.py tests/utils/test_task_group.py
  Run flake8..........................................................Passed
  1. Running specific checks in container using shell scripts. Scripts are located in ./scripts/in_container directory.
root@df8927308887:/opt/airflow# ./scripts/in_container/
   bin/                                        run_flake8.sh*
   check_environment.sh*                       run_generate_constraints.sh*
   entrypoint_ci.sh*                           run_init_script.sh*
   entrypoint_exec.sh*                         run_install_and_test_provider_packages.sh*
   _in_container_script_init.sh*               run_mypy.sh*
   prod/                                       run_prepare_provider_packages.sh*
   run_ci_tests.sh*                            run_prepare_provider_documentation.sh*
   run_clear_tmp.sh*                           run_system_tests.sh*
   run_docs_build.sh*                          run_tmux_welcome.sh*
   run_extract_tests.sh*                       stop_tmux_airflow.sh*
   run_fix_ownership.sh*                       update_quarantined_test_status.py*


root@df8927308887:/opt/airflow# ./scripts/in_container/run_docs_build.sh
  1. Enabling Pre-commit check before push. It will run pre-commit automatically before committing and stops the commit
$ cd ~/Projects/airflow
$ pre-commit install
$ git commit -m "Added xyz"
  1. To disable Pre-commit
$ cd ~/Projects/airflow
$ pre-commit uninstall
  1. Go to your GitHub account and open your fork project and click on Branches

    Goto fork and select branches
  2. Click on New pull request button on branch from which you want to raise a pull request.

    Accessing local airflow
  3. Add title and description as per Contributing guidelines and click on Create pull request.

    Accessing local airflow

Often it takes several days or weeks to discuss and iterate with the PR until it is ready to merge. In the meantime new commits are merged, and you might run into conflicts, therefore you should periodically synchronize main in your fork with the apache/airflow main and rebase your PR on top of it. Following describes how to do it.

Setup and develop using Gitpod online workspaces
  1. Goto https://github.com/apache/airflow/ and fork the project.

    Forking Apache Airflow project
  2. Goto your github account's fork of airflow click on Code and copy the clone link.

    Cloning github fork of Apache airflow
  3. Add goto https://gitpod.io/#<copied-url> as shown.

    Open personal airflow clone with Gitpod
  1. Breeze is already initialized in one of the terminals in Gitpod
  2. Once the breeze environment is initialized, create airflow tables and users from the breeze CLI. airflow db reset is required to execute at least once for Airflow Breeze to get the database/tables created.

Note

This step is needed when you would like to run/use webserver.

root@b76fcb399bb6:/opt/airflow# airflow db reset
root@b76fcb399bb6:/opt/airflow# airflow users create --role Admin --username admin --password admin \
  --email [email protected] --firstname foo --lastname bar
  1. Closing Breeze environment. After successfully finishing above command will leave you in container, type exit to exit the container
root@b76fcb399bb6:/opt/airflow#
root@b76fcb399bb6:/opt/airflow# exit
$ ./breeze stop

Gitpod default image have all the required packages installed.

  1. Add following line to ~/.bashrc in order to call breeze command from anywhere.
export PATH=${PATH}:"/workspace/airflow"
source ~/.bashrc
  1. Starting breeze environment using breeze start-airflow starts Breeze environment with last configuration run. It also automatically starts webserver, backend and scheduler. It drops you in tmux with scheduler in bottom left and webserver in bottom right. Use [Ctrl + B] and Arrow keys to navigate.
$ breeze start-airflow

    Use CI image.

 Branch name:            main
 Docker image:           ghcr.io/apache/airflow/main/ci/python3.8:latest
 Airflow source version: 2.3.0.dev0
 Python version:         3.8
 Backend:                mysql 5.7


 Port forwarding:

 Ports are forwarded to the running docker containers for webserver and database
   * 12322 -> forwarded to Airflow ssh server -> airflow:22
   * 28080 -> forwarded to Airflow webserver -> airflow:8080
   * 25555 -> forwarded to Flower dashboard -> airflow:5555
   * 25433 -> forwarded to Postgres database -> postgres:5432
   * 23306 -> forwarded to MySQL database  -> mysql:3306
   * 21433 -> forwarded to MSSQL database  -> mssql:1443
   * 26379 -> forwarded to Redis broker -> redis:6379

 Here are links to those services that you can use on host:
   * ssh connection for remote debugging: ssh -p 12322 [email protected] pw: airflow
   * Webserver: http://127.0.0.1:28080
   * Flower:    http://127.0.0.1:25555
   * Postgres:  jdbc:postgresql://127.0.0.1:25433/airflow?user=postgres&password=airflow
   * Mysql:     jdbc:mysql://127.0.0.1:23306/airflow?user=root
   * Redis:     redis://127.0.0.1:26379/0
Accessing local airflow
  1. You can access the ports as shown
Accessing ports via VSCode UI
  1. Click on the branch symbol in the status bar

    Creating a new branch
  2. Give a name to a branch and checkout

    Giving a name to a branch

All Tests are inside ./tests directory.

  • Running Unit tests inside Breeze environment.

    Just run pytest filepath+filename to run the tests.

root@4a2143c17426:/opt/airflow# pytest tests/utils/test_session.py
======================================= test session starts =======================================
platform linux -- Python 3.7.12, pytest-6.2.5, py-1.11.0, pluggy-1.0.0 -- /usr/local/bin/python
cachedir: .pytest_cache
rootdir: /opt/airflow, configfile: pytest.ini
plugins: anyio-3.3.4, flaky-3.7.0, asyncio-0.16.0, cov-3.0.0, forked-1.3.0, httpx-0.15.0, instafail-0.4.2, rerunfailures-9.1.1, timeouts-1.2.1, xdist-2.4.0, requests-mock-1.9.3
setup timeout: 0.0s, execution timeout: 0.0s, teardown timeout: 0.0s
collected 4 items

tests/utils/test_session.py::TestSession::test_raised_provide_session PASSED                          [ 25%]
tests/utils/test_session.py::TestSession::test_provide_session_without_args_and_kwargs PASSED         [ 50%]
tests/utils/test_session.py::TestSession::test_provide_session_with_args PASSED                       [ 75%]
tests/utils/test_session.py::TestSession::test_provide_session_with_kwargs PASSED                     [100%]

====================================== 4 passed, 11 warnings in 33.14s ======================================
  • Running All the tests with Breeze by specifying required Python version, backend, backend version
$ breeze --backend mysql --mysql-version 5.7 --python 3.8 --db-reset --test-type All  tests
  • Running specific test in container using shell scripts. Testing in container scripts are located in ./scripts/in_container directory.
root@4a2143c17426:/opt/airflow# ls ./scripts/in_container/
_in_container_script_init.sh  quarantine_issue_header.md                 run_mypy.sh
_in_container_utils.sh        run_anything.sh                            run_prepare_airflow_packages.sh
airflow_ci.cfg                run_ci_tests.sh                            run_prepare_provider_documentation.sh
bin                           run_docs_build.sh                          run_prepare_provider_packages.sh
check_environment.sh          run_extract_tests.sh                       run_resource_check.sh
check_junitxml_result.py      run_fix_ownership.sh                       run_system_tests.sh
configure_environment.sh      run_flake8.sh                              run_tmux_welcome.sh
entrypoint_ci.sh              run_generate_constraints.sh                stop_tmux_airflow.sh
entrypoint_exec.sh            run_init_script.sh                         update_quarantined_test_status.py
prod                          run_install_and_test_provider_packages.sh

root@df8927308887:/opt/airflow# ./scripts/in_container/run_docs_build.sh
  • Running specific type of test

    • Types of tests
    • Running specific type of test

    Note

    Before starting a new instance, let's clear the volume and databases "fresh like a daisy". You can do this by:

    $ breeze --backend mysql --mysql-version 5.7 --python 3.8 --db-reset --test-type Core
  • Running Integration test for specific test type

    • Running an Integration Test
    $ breeze --backend mysql --mysql-version 5.7 --python 3.8 --db-reset --test-type All --integration mongo