Hi,
Issue:
I am trying to figure out an issue with accessing Tesla GPUs in WSL2 and docker containers.
Environment:
- Operating System: Windows Server 2022 (Up-to-date)
- WSL Version: WSL2
- WSL Distribution: Ubuntu 24.04.2 LTS WSL-Version: 2.4.13.0 Kernelversion: 5.15.167.4-1
- GPU: NVIDIA A40
- NVIDIA Driver: 576.02
- CUDA Version: 12.9
- Docker: Docker Desktop 4.31.1 with WSL2 backend
Detailed Issue description:
My system contains 2 Tesla A40s one in TCC and one in WDDM Mode. I would need cuda access in a docker container and am struggling to figure out why my GPUs are not detected. I am aware the GPU in TCC mode cannot be accessed in WSL and therefore also not in docker. Should the GPU in WDDM mode not be accessible in WSL? I tested this on another system (Windows 11, nvidia rtx) without any issues. Are the datacenter GPUs blocked from being accessible in docker, is this a windows server related issue or even a hardware related issue or just a driver related issue? Is this only possible via vGPU with Tesla GPUs?
Maybe someone has an idea why this is not working. Or an idea what I could try.
Further system info and debug output:
Cuda version on host system: 12.9 (also tried with version 12.4)
±----------------------------------------------------------------------------------------+
| NVIDIA-SMI 576.02 Driver Version: 576.02 CUDA Version: 12.9 |
|-----------------------------------------±-----------------------±---------------------+
|=========================================+===================|
| 0 NVIDIA A40 TCC | 00000000:21:00.0 Off | 0 |
| 0% 26C P8 12W / 300W | 10MiB / 46068MiB | 0% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+
| 1 NVIDIA A40 WDDM | 00000000:41:00.0 Off | 0 |
| 0% 33C P8 20W / 300W | 818MiB / 46068MiB | 3% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Wed_Apr__9_19:29:17_Pacific_Daylight_Time_2025
Cuda compilation tools, release 12.9, V12.9.41
Build cuda_12.9.r12.9/compiler.35813241_0
docker version:
Client:
Version: 27.0.3
API version: 1.45 (downgraded from 1.46)
Go version: go1.21.11
Git commit: 7d4bcd8
Built: Sat Jun 29 00:03:32 2024
OS/Arch: windows/amd64
Context: desktop-linuxServer: Docker Desktop 4.31.1 (153621)
Engine:
Version: 26.1.4
API version: 1.45 (minimum version 1.24)
Go version: go1.21.11
Git commit: de5c9cf
Built: Wed Jun 5 11:29:22 2024
OS/Arch: linux/amd64
Experimental: true
containerd:
Version: 1.6.33
GitCommit: d2d58213f83a351ca8f528a95fbd145f5654e957
runc:
Version: 1.1.12
GitCommit: v1.1.12-0-g51d5e94
docker-init:
Version: 0.19.0
GitCommit: de40ad0
DOCKER ERRORS:
Docker cuda test container nvcr.io/nvidia/k8s/cuda-sample:nbody fails with
Error: only 0 Devices available, 1 requested. Exiting.
Docker develop cuda container nvidia/cuda:12.9.0-cudnn-devel-ubuntu24.04 fails with:
CUDA Version 12.9.0
…
2025-05-21 15:23:45 WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available.
2025-05-21 15:23:45 Use the NVIDIA Container Toolkit to start this container with GPU support; see
2025-05-21 15:23:45 NVIDIA Cloud Native Technologies - NVIDIA Docs .
error starting from shell:
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as ‘legacy’
nvidia-container-cli: detection error: nvml error: unknown error: unknown.
WSL ERRORS:
wsl2 ubuntu installed with NVIDIA Container Toolkit for wsl ubuntu does not get any gpu info
nvidia-smi
Unable to determine the device handle for GPU0: 0000:21:00.0: Unknown Error
Unable to determine the device handle for GPU1: 0000:41:00.0: Unknown Error
No devices were found
nvidia-container-cli info
nvidia-container-cli: detection error: nvml error: unknown error
nvidia-debugdump -z -D
nvmlInit succeeded
Using ALL devices
Dumping all components.
nvdZip_Open(dump.zip) for writing succeeded
System: Dumping component: system_info.
ERROR: GetCaptureBufferSize failed, GPU access blocked by the operating system, bufSize: >0x0
ERROR: internal_getDumpBuffer failed, return code: 0x11
ERROR: internal_dumpSystemComponent() failed, return code: 0x11
System: Dumping component: error_data.
ERROR: GetCaptureBufferSize failed, GPU access blocked by the operating system, bufSize: 0x0
ERROR: internal_getDumpBuffer failed, return code: 0x11
ERROR: internal_dumpSystemComponent() failed, return code: 0x11
ERROR: internal_dumpNvLogComponent() failed, return code: 0x11
ERROR: internal_dumpGpuComponent() failed, return code: 0x3e7
ERROR: internal_dumpGpuComponent() failed, return code: 0x3e7
ERROR: internal_dumpNvLogComponent() failed, return code: 0x11
ERROR: internal_dumpGpuComponent() failed, return code: 0x3e7
ERROR: internal_dumpGpuComponent() failed, return code: 0x3e7
ERROR: internal_dumpNvLogComponent() failed, return code: 0x11