Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Automatic GPU Switch #845

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from
Draft

Conversation

Steel-skull
Copy link

@Steel-skull Steel-skull commented Oct 30, 2024

Docker Windows GPU Passthrough

[this is not fully tested as im waiting for a gpu to come in]

Automated GPU management solution for Windows in Docker containers with NVIDIA GPU passthrough support. This project provides scripts and configurations to dynamically manage GPU binding between host and Docker containers, with support for multiple GPUs and audio devices.

Prerequisites

  • Unraid server (or Linux system with Docker)
  • NVIDIA GPU(s)
  • Docker and Docker Compose
  • VFIO-PCI support in kernel
  • NVIDIA drivers installed on host

Quick Start

  1. Clone the repository:
git clone https://github.com/yourusername/docker-windows-gpu.git
cd docker-windows-gpu
  1. Configure your environment:
# Set to your GPU ID(s), PCI address(es), or 'none'
add NVIDIA_VISIBLE_DEVICES=0
  1. Start the container:
docker-compose up -d

Configuration

Environment Variables

  • NVIDIA_VISIBLE_DEVICES: Specify GPU(s) to use
    • Single GPU: NVIDIA_VISIBLE_DEVICES=0
    • Multiple GPUs: NVIDIA_VISIBLE_DEVICES=0,1
    • PCI addresses: NVIDIA_VISIBLE_DEVICES=0000:03:00.0,0000:04:00.0
    • No GPU: NVIDIA_VISIBLE_DEVICES=none

Docker Compose

The provided docker-compose.yml includes all necessary configurations for:

  • GPU passthrough
  • RDP access
  • KVM support
  • Network management
  • Persistent storage

Usage

Manual GPU Management (until I find a way to run pre and post stop, use it with user scripts)

Bind GPU to container:

NVIDIA_VISIBLE_DEVICES=0 /boot/config/plugins/user.scripts/gpu-switch.sh start windows

Release GPU:

NVIDIA_VISIBLE_DEVICES=0 /boot/config/plugins/user.scripts/gpu-switch.sh stop windows

Script Details

The gpu-switch.sh script handles:

  1. GPU detection and validation
  2. Driver management (NVIDIA ⟷ VFIO-PCI)
  3. Audio device pairing
  4. Docker container configuration
  5. Error handling and logging

gpu switch version: 0.1

# Without GPU:
NVIDIA_VISIBLE_DEVICES="" ./gpu-switch.sh start container_name

# With single GPU:
NVIDIA_VISIBLE_DEVICES="0" ./gpu-switch.sh start container_name

# With multiple GPUs:
NVIDIA_VISIBLE_DEVICES="0,1" ./gpu-switch.sh start container_name

# With PCI addresses:
NVIDIA_VISIBLE_DEVICES="0000:03:00.0,0000:04:00.0" ./gpu-switch.sh start container_name

# Explicitly disable GPU:
NVIDIA_VISIBLE_DEVICES="none" ./gpu-switch.sh start container_name
@Steel-skull
Copy link
Author

have to modify the docker compose side as I was under the impression it supported pre-start and post-stop scripts but I misread and its post-start and pre-stop, ill need to find a new way to work this, script still works and can be implemented using user scripts in unraid.

[again tho im waiting on a gpu so i haven't been able to fully test it]

@Steel-skull Steel-skull mentioned this pull request Oct 30, 2024
@kroese
Copy link
Contributor

kroese commented Nov 9, 2024

Very interesting work!! Did you already receive your GPU to test it?

@JosueIsrael-prog
Copy link

Very good

@maksymdor
Copy link

Hmm! Interesting

if ! check_gpu_needed; then
log "Continuing without GPU management"
exit 0
fi
Copy link

@vinkay215 vinkay215 Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of listing all containers, you can directly check the existence of the container using docker container inspect, which is more efficient since it only checks the specified container without scanning the entire list. Here’s how to replace that line:

if ! docker container inspect "$CONTAINER_NAME" > /dev/null 2>&1; then
    error_exit "Container $CONTAINER_NAME does not exist"
fi

The docker container inspect command returns an error if the container does not exist, so you can use it to directly verify the container’s existence without listing all containers.

Copy link
Author

@Steel-skull Steel-skull Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ill take a look at implementing this thanks for the ideas

}

# Convert any GPU identifier to PCI address
convert_to_pci_address() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorporating these improvements, here’s the final optimized convert_to_pci_address fu


convert_to_pci_address() {
    local device="$1"
    local gpu_address=""

    if [[ "$device" =~ ^[0-9]+$ || "$device" =~ ^GPU-.*$ ]]; then
        # Convert GPU index or UUID to PCI address
        gpu_address=$(nvidia-smi --id="$device" --query-gpu=gpu_bus_id --format=csv,noheader 2>/dev/null | tr -d '[:space:]')
    else
        # Direct PCI address provided
        gpu_address="$device"
    fi

    # Check for valid output
    if [ -z "$gpu_address" ]; then
        error_exit "Failed to get PCI address for device: $device"
    fi

    # Standardize format
    echo "$gpu_address" | sed -e 's/0000://' -e 's/\./:/g'
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ill take a look at implementing this thanks for the ideas on this as well

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merged with main

@tl123987
Copy link

tl123987 commented Nov 12, 2024

share failed? is there something wrong?

@Steel-skull
Copy link
Author

Very interesting work!! Did you already receive your GPU to test it?

sadly no the one I ordered from ebay was extremely unstable (kept crashing my server when using it with ollama) so im waiting for my money back

@Steel-skull
Copy link
Author

share failed? is there something wrong?

you will have to expand on this, i dont understand.

@tl123987
Copy link

Looking forward to your completion, thank you, I hope there will be a complete tutorial in the future

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants