Skip to content

Using Validation Framework

Andrey Talman edited this page Dec 28, 2022 · 25 revisions

Context

Validation framework is used to smoke test and validate PyTorch and Domain libraries on both CPU and GPU machines. Linux, Windows and MacOS (x86 and Apple Silicon) are supported. Following are the high level requirements for the validation framework:

  • Support Linux, Windows and MacOS using ephemeral runners with only minimal dependencies installed
  • Support CPU and GPU runners, with older Nvidia GPU driver to provide for backward compatibility tests (our CI runs on latest Nvidia Drivers)
  • Execute on nightly basis
  • Surface the result on HUD
  • Cover all the Released Domain Libraries
  • Follow same instructions as get started page for installation
  • Cover nightly, test and release channel
  • Use same matrix as PyTorch Core and Nova Project, so that after new build is introduced to PyTorch Core, it should become available for validation
  • Smoke test that will cover PyTorch standalone or PyTorch with all Domain Libraries

How we use Validation Framework

Validation framework is used in two different ways:

  • Nightly Validation of PyTorch, TorchAudio, TorchVision as one ecosystem. Using same instructions as in get started page. These workflows are implemented in validate-binaries.yml and are used for nightly and release validation by PyTorchDev infra team.

  • Standalone Domain Library validation. Currently implemented for TorchText and TorchRec domain libraries. This is completely customized way of using validation framework and in theory this approach can be used to validate any project within PyTorch organization. Please see onboarding documentation if you are interested in start using the Validation Framework.

How to Onboard to Validation Framework

Onboarding to validation framework is straight forward. You will need to create the following:

  1. New GitHub action workflow that will run the validation. This workflow should call validate-domain-library.yml workflow.
  2. New script that will perform installation of your package and smoke testing.
  3. Optional new GitHUb action workflow that will run validation on nightly basis.

Following is the GitHub action workflow from TorchText repo:

name: Validate binaries

on:
  workflow_call:
    inputs:
      channel:
        description: "Channel to use (nightly, test, release, all)"
        required: false
        type: string
        default: release
      os:
        description: "Operating system to generate for (linux, windows, macos, macos-arm64)"
        required: true
        type: string
      ref:
        description: 'Reference to checkout, defaults to empty'
        default: ""
        required: false
        type: string
  workflow_dispatch:
    inputs:
      channel:
        description: "Channel to use (nightly, test, release, all)"
        required: true
        type: choice
        options:
          - release
          - nightly
          - test
          - all
      os:
        description: "Operating system to generate for (linux, windows, macos)"
        required: true
        type: choice
        default: all
        options:
          - windows
          - linux
          - macos
          - all
      ref:
        description: 'Reference to checkout, defaults to empty'
        default: ""
        required: false
        type: string

jobs:
  validate-binaries:
    uses: pytorch/builder/.github/workflows/validate-domain-library.yml@main
    with:
      package_type: "conda,wheel"
      os: ${{ inputs.os }}
      channel: ${{ inputs.channel }}
      repository: "pytorch/text"
      smoke_test: "./.github/scripts/validate_binaries.sh"

Following inputs are currently supported.

package_type: This is package type that you intend to test. We support following package types: conda, wheel, libtorch, all
os: Operating System to run tests on. We support following: linux, windows, macos, macos-arm64
channel: Channel to use nightly, test, release
repository: Which repository you are calling validate workflow from
smoke_test: Script that should install the binary and perform validation. This is your local bash shell script.  

Following is validate_binaries.sh from TorchText repo:

#!/usr/bin/env bash
set -ex

if [[ ${TARGET_OS} == 'windows' ]]; then
    source /c/Jenkins/Miniconda3/etc/profile.d/conda.sh
else
    eval "$(conda shell.bash hook)"
fi

conda create -y -n ${ENV_NAME} python=${DESIRED_PYTHON} numpy
conda activate ${ENV_NAME}
export CONDA_CHANNEL="pytorch"
export PIP_DOWNLOAD_URL="https://download.pytorch.org/whl/cpu"
export TEXT_PIP_PREFIX=""

if [[ ${CHANNEL} = 'nightly' ]]; then
    export TEXT_PIP_PREFIX="--pre"
    export PIP_DOWNLOAD_URL="https://download.pytorch.org/whl/nightly/cpu"
    export CONDA_CHANNEL="pytorch-nightly"
elif [[ ${CHANNEL} = 'test' ]]; then
    export PIP_DOWNLOAD_URL="https://download.pytorch.org/whl/test/cpu"
    export CONDA_CHANNEL="pytorch-test"
fi

if [[ ${PACKAGE_TYPE} = 'conda' ]]; then
    conda install -y torchtext pytorch -c ${CONDA_CHANNEL}
else
    pip install ${TEXT_PIP_PREFIX} torchtext torch --extra-index-url ${PIP_DOWNLOAD_URL}
fi

python  ./test/smoke_tests/smoke_tests.py

Finally if you want to run the workflow on nightly basis add validate-nightly-binaries.yml workflow.

# Scheduled validation of the nightly binaries
name: cron

on:
  schedule:
    # At 5:30 pm UTC (7:30 am PDT)
    - cron: "30 17 * * *"
  # Have the ability to trigger this job manually through the API
  workflow_dispatch:
  push:
    branches:
      - main
    paths:
      - .github/workflows/validate-nightly-binaries.yml
      - .github/workflows/validate-binaries.yml
      - .github/scripts/validate-binaries.sh
  pull_request:
    paths:
      - .github/workflows/validate-nightly-binaries.yml
      - .github/workflows/validate-binaries.yml
      - .github/scripts/validate-binaries.sh
jobs:
  nightly:
    uses: ./.github/workflows/validate-binaries.yml
    with:
      channel: nightly
      os: all

Additionally you can refer to TorchRec repo for additional onboarding examples. validate-binaries.yml, validate_binaries.sh and validate-nightly-binaries.yml

Instance type and Operating System details used in validation

Operating System Type GPUs GPU Memory (GB) vCPU Memory (GiB) details
Linux CPU c5.2xlarge NA NA 8 16 Docker and CentOS 7
Linux GPU g3.4xlarge 1 8 16 122 Docker and CentOS 7
Windows CPU c5d.4xlarge NA NA 16 32 Windows 2019
Windows GPU p3.2xlarge 1 Tesla V100 16 8 61 Windows 2019

Binary build matrix

Binary build matrix contains current configuration that is supported by PyTorch core and domain libraries. Binaries build matrix is generated using the following workflow: generate_binary_build_matrix.yml. For additional details refer to documentation here

Currently Following CUDA and Python configurations are supported:

CUDA CUDNN additional details
11.6 8.5.0.96 Stable CUDA Release
11.7 8.5.0.96 Latest CUDA Release
11.8 8.5.0.96 CUDA Release Supported on nightly
Python versions Package details
3.7-3.10 Supported on Conda and Pip
3.11 Supported on Pip only

The output of the Generate workflow workflow is a JSON array of entires which contain basic information needed to install and test the package. Following fields are supported:

{
"python_version": "3.7", 
"gpu_arch_type": "cuda", 
"gpu_arch_version": "11.7", 
"desired_cuda": "cu117", 
"container_image": "pytorch/manylinux-builder:cuda11.7", 
"package_type": "wheel", 
"build_name": "wheel-py3_7-cuda11_7", 
"validation_runner": "windows.8xlarge.nvidia.gpu", 
"installation": "pip3 install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cu117", 
"channel": "nightly", 
"upload_to_base_bucket": "no", 
"stable_version": "1.13.1"
}
Clone this wiki locally