Skip to content

Using Nova Reusable Build Workflows

Omkar Salpekar edited this page Dec 7, 2022 · 13 revisions

Creating your own pipelines for building and distributing binaries from scratch is hard. Monitoring, alerting, and verification are even harder. Let us help you! With Nova Reusable Workflows, you have a comprehensive set of shared components that you can simply call to get binary builds and distribution working across your entire build matrix out-of-the-box. For more involved build processes, we have a number of hooks for domain-specific customizability. No need to copy dozens of bash scripts between repos any longer!

Quick Start

Let's say you want to distribute Python Wheels for Linux systems for your package. Create a file called .github/workflows/build-wheels-linux.yml in your repo. Add the following to the file, and change the fields marked TODO for your use-case:

name: Build Linux Wheels

on:
  pull_request:
  push:
    branches:
      - nightly
  workflow_dispatch:

jobs:
  generate-matrix:
    uses: pytorch/test-infra/.github/workflows/generate_binary_build_matrix.yml@main
    with:
      package-type: wheel
      os: linux
      test-infra-repository: pytorch/test-infra
      test-infra-ref: main
      with-cuda: disable
  build:
    needs: generate-matrix
    name: ${{ matrix.repository }}
    uses: pytorch/test-infra/.github/workflows/build_wheels_linux.yml@main
    with:
      # TODO #1: Enter the name of your repo
      repository: <Your_Organization_Name/Your_Repo_Name>
      ref: ""
      # TODO #2: Have custom build steps before building the wheel? Pass the path to the shell script with those steps below.
      pre-script: <path/to/your/Custom_Pre_Build_Script.sh>
      # TODO #3: Same as above, but post-build steps. If you have none, just use "".
      post-script: <path/to/your/Custom_Post_Build_Script.sh>
      # TODO #4: Want to verify the correctness of your binaries before distributing them? Add custom python smoke tests below.
      smoke-test-script: <path/to/your/Smoke_Tests.py>
      # TODO #5: Enter the name of your python package below. (for example, torchvision)
      package-name: <Your_Python_Package_Name>
      test-infra-repository: pytorch/test-infra
      test-infra-ref: main
      build-matrix: ${{ needs.generate-matrix.outputs.matrix }}
      trigger-event: ${{ github.event_name }}
    secrets:
      AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }}
      AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }}

And that's it! Every time a push is done to your repo's nightly branch, this will build, verify, and upload nightlies to download.pytorch.org. Every time a push is done to your release candidate branches, the same workflow will be run. And official releases will also be done via this workflow.

How It Works

We have 6 "base" workflows in pytorch/test-infra that take care of most of the logic for creating and uploading binaries. The workflows cover the set {Linux, MacOS, Windows} x {Wheels, Conda}:

To run these workflows, you need to write a "caller" workflow in the .github/workflows/ directory of your repo. It is best to have a separate caller workflow for each base workflow you want to call. Each caller workflow does 3 things:

  1. Configure Triggers (when should build workflow should be run?)
  2. Generate the Build Matrix
  3. Calls the base workflow

Configure Triggers

The fields you define in the on: section of the yaml config determine which triggers your workflow runs after. This part is vanilla GitHub Actions, so this should be a good reference: Triggering a Workflow. Generally you want your workflows to run on pushes to nightly and release candidate branches, so you can list those branches under the push: field. Additionally, you may want these jobs triggered on PR's, so add the pull_request: field. Lastly, it's nice to be able to manually trigger your workflow from the GitHub UI. To enable this, add the workflow_dispatch: trigger. Your on: section may end up looking something like this:

on:
  pull_request:
  push:
    branches:
      - nightly
  workflow_dispatch:

Generate the Build Matrix

The pytorch/test-infra repo contains a workflow called generate_binary_build_matrix.yml that generates a build matrix based on provided inputs. For example, you'll typically want to support various Python Versions, GPU architectures, etc. This build matrix enumerates all of these configurations that you want your binaries to be built with. You can click on the link above to view the full list of inputs to the build matrix generation job, but these are the main inputs you'll likely need to set:

  • os: the Operating System you want these binaries to support
  • package-type: either wheel or conda (the package you are building)
  • with-cuda: toggles whether you need to build separate binaries with GPU support. Default is enable, set to disable if dedicated CUDA builds are not required.

The call to generate the build matrix should be its own job in the jobs: section of your caller. It may look something like this:

jobs:
  generate-matrix:
    uses: pytorch/test-infra/.github/workflows/generate_binary_build_matrix.yml@main
    with:
      package-type: wheel
      os: linux
      test-infra-repository: pytorch/test-infra
      test-infra-ref: main
      with-cuda: disable

Calling the base workflow

In this step, we call one of the 6 base workflows with some inputs. The exact inputs you need to provide will vary by the exact base workflow you're calling. You can click on the links to those workflows above to view the detailed list of expected inputs, default values, and explanations. Here are a few of the key inputs to keep in mind:

  • repository: the name of your repository
  • pre-script: a bash script that defines steps that you want to run pre-build. For example, installing a set of dependencies could go here
  • post-script: a bash script that defines steps that you want to run post-build
  • smoke-test-script: a Python script that tests the health of your built binary. If no script is provided, we will just check if the import works
  • package-name: the name of your base python module. For example, torchvision or torchtext
  • conda-package-directory: (Conda-only) the directory where your meta.yaml file lives
  • runner-type: (Mac-only) the type of Mac runner to use. Use macos-12 for x86 MacOS or macos-m1-12 for Arm64 MacOS

To upload you binaries to appropriate distribution channels, you must provide secrets as well. For conda builds, you will need to pass CONDA_PYTORCHBOT_TOKEN, while wheels builds will need AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID and AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY to push binaries to download.pytorch.org.

You must add the call to the base workflow as another job after the generate-matrix job in the jobs: section. It may look as follows:

  build:
    needs: generate-matrix
    name: ${{ matrix.repository }}
    uses: pytorch/test-infra/.github/workflows/build_wheels_linux.yml@main
    with:
      repository: pytorch/vision
      ref: ""
      pre-script: packaging/pre_build_script.sh
      post-script: packaging/post_build_script.sh
      smoke-test-script: tests/smoke_test.py
      package-name: torchvision
      test-infra-repository: pytorch/test-infra
      test-infra-ref: main
      build-matrix: ${{ needs.generate-matrix.outputs.matrix }}
      trigger-event: ${{ github.event_name }}
    secrets:
      AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }}
      AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }}

Examples

Nova Workflows are now used by torchvision, torchtext, torchrec, (WIP) torchaudio for binary builds. For reference, all of torchvision's caller workflows can be found here: https://github.com/pytorch/vision/tree/main/.github/workflows. The Nova binary build workflows are named build-*.yml.