Use multiple GPUs to process queue #1126

theodufort · 2024-11-10T20:48:30Z

I am trying to use both of my GPUs who are passed through to my docker container.

services: faster-whisper-server-cuda: image: fedirz/faster-whisper-server:latest-cuda build: dockerfile: Dockerfile.cuda context: . platforms: - linux/amd64 - linux/arm64 restart: unless-stopped ports: - 8162:8000 environment: - WHISPER__MODEL=deepdml/faster-whisper-large-v3-turbo-ct2 - WHISPER__INFERENCE_DEVICE=cuda - WHISPER__COMPUTE_TYPE=int8 - WHISPER__NUM_WORKERS=4 - WHISPER__CPU_THREADS=4 - WHISPER_DEVICE=cuda - DEFAULT_LANGUAGE=en - PRELOAD_MODELS=["deepdml/faster-whisper-large-v3-turbo-ct2"] volumes: - hugging_face_cache:/root/.cache/huggingface privileged: true deploy: resources: reservations: devices: - driver: nvidia count: all capabilities: [gpu] volumes: hugging_face_cache:

I tried everything but it won't use more than 1 GPU even if:

The text was updated successfully, but these errors were encountered:

MahmoudAshraf97 · 2024-11-11T08:45:02Z

you need to explicitly assign the model to multiple gpus using device_index, and even this will not enable data parallelism, I think the correct place to raise this issue is in CTranslate2 Repo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use multiple GPUs to process queue #1126

Use multiple GPUs to process queue #1126

theodufort commented Nov 10, 2024 •

edited by MahmoudAshraf97

Loading

MahmoudAshraf97 commented Nov 11, 2024

Use multiple GPUs to process queue #1126

Use multiple GPUs to process queue #1126

Comments

theodufort commented Nov 10, 2024 • edited by MahmoudAshraf97 Loading

MahmoudAshraf97 commented Nov 11, 2024

theodufort commented Nov 10, 2024 •

edited by MahmoudAshraf97

Loading