Running docker-compose failing in GPU detection

kkocyk · June 7, 2021, 12:04pm

Hi,

I was using a recent guide to install Nvidia docker under WSL2 (Guide to run CUDA + WSL + Docker with latest versions (21382 Windows build + 470.14 Nvidia)) and it is working perfectly when I’m using docker run. Unfortunately when I’m trying to do the same via docker-compose GPU is not detected. With the newest os and drivers (now I use Windows build 21390 and 470-76 driver) problem still exists.

Working example with docker run:

➜ docker run --gpus all -it --rm  nvcr.io/nvidia/tensorflow:20.10-tf1-py3 python -c "import tensorflow as tf; tf.test.is_gpu_available()"

================
== TensorFlow ==
================

NVIDIA Release 20.10-tf1 (build 16775850)
TensorFlow Version 1.15.4

Container image Copyright (c) 2020, NVIDIA CORPORATION.  All rights reserved.
Copyright 2017-2020 The TensorFlow Authors.  All rights reserved.

NVIDIA Deep Learning Profiler (dlprof) Copyright (c) 2020, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying project or file.

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use 'nvidia-docker run' to start this container; see
   https://github.com/NVIDIA/nvidia-docker/wiki/nvidia-docker .

NOTE: MOFED driver for multi-node communication was not detected.
      Multi-node communication performance may be reduced.

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for TensorFlow.  NVIDIA recommends the use of the following flags:
   nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...

2021-06-07 11:49:57.381428: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
2021-06-07 11:49:58.673081: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3293805000 Hz
2021-06-07 11:49:58.674672: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x41d4a90 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-06-07 11:49:58.674712: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-06-07 11:49:58.677302: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-06-07 11:49:59.154081: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1072] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-06-07 11:49:59.197264: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4274910 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-06-07 11:49:59.197326: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): NVIDIA GeForce RTX 3080 Laptop GPU, Compute Capability 8.6
2021-06-07 11:49:59.205416: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1072] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-06-07 11:49:59.206496: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1665] Found device 0 with properties:
name: NVIDIA GeForce RTX 3080 Laptop GPU major: 8 minor: 6 memoryClockRate(GHz): 1.545
pciBusID: 0000:01:00.0
2021-06-07 11:49:59.207679: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-06-07 11:49:59.209889: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-06-07 11:49:59.210801: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-06-07 11:49:59.225675: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-06-07 11:49:59.241940: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11
2021-06-07 11:49:59.244261: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-06-07 11:49:59.244466: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8

Example docker-compose:
version: “3.8”

services:
  tf:
    image: nvcr.io/nvidia/tensorflow:20.10-tf1-py3
    command: python -c "import tensorflow as tf; tf.test.is_gpu_available()"

Output:

➜ docker-compose -f docker-compose-tf.yml up
Starting test_tf_1 ... done
Attaching to test_tf_1
tf_1  |
tf_1  | ================
tf_1  | == TensorFlow ==
tf_1  | ================
tf_1  |
tf_1  | NVIDIA Release 20.10-tf1 (build 16775850)
tf_1  | TensorFlow Version 1.15.4
tf_1  |
tf_1  | Container image Copyright (c) 2020, NVIDIA CORPORATION.  All rights reserved.
tf_1  | Copyright 2017-2020 The TensorFlow Authors.  All rights reserved.
tf_1  |
tf_1  | NVIDIA Deep Learning Profiler (dlprof) Copyright (c) 2020, NVIDIA CORPORATION.  All rights reserved.
tf_1  |
tf_1  | Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
tf_1  | NVIDIA modifications are covered by the license terms that apply to the underlying project or file.
tf_1  |
tf_1  | WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
tf_1  |    Use 'nvidia-docker run' to start this container; see
tf_1  |    https://github.com/NVIDIA/nvidia-docker/wiki/nvidia-docker .
tf_1  |
tf_1  | NOTE: MOFED driver for multi-node communication was not detected.
tf_1  |       Multi-node communication performance may be reduced.
tf_1  |
tf_1  | NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
tf_1  |    insufficient for TensorFlow.  NVIDIA recommends the use of the following flags:
tf_1  |    nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...
tf_1  |
tf_1  | 2021-06-07 11:50:09.934550: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
tf_1  | WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
tf_1  | 2021-06-07 11:50:11.119166: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3293805000 Hz
tf_1  | 2021-06-07 11:50:11.120712: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5c17250 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
tf_1  | 2021-06-07 11:50:11.120741: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
tf_1  | 2021-06-07 11:50:11.121963: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
tf_1  | 2021-06-07 11:50:11.121995: E tensorflow/stream_executor/cuda/cuda_driver.cc:339] failed call to cuInit: UNKNOWN ERROR (303)
tf_1  | 2021-06-07 11:50:11.122007: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (8d40e9c8c6eb): /proc/driver/nvidia/version does not exist
test_tf_1 exited with code 0

Is it possible somehow to use docker-compose with GPU support under WSL2?

onomatopellan · June 7, 2021, 9:16pm

You are missing the device capabilities. Try with this docker-compose.yml file:

services:
  test:
    image: tensorflow/tensorflow:latest-gpu
    command: python -c "import tensorflow as tf;tf.test.gpu_device_name()"
    deploy:
      resources:
        reservations:
          devices:
          - capabilities: [gpu]

kkocyk · June 8, 2021, 5:43am

Thanks! Now it works like a charm! I’m a little bit surprised because under bare ubuntu it is not required

Topic		Replies	Views
Tensor Core Usage on WSL2 with RTX 3080 Laptop GPU CUDA on Windows Subsystem for Linux	1	2170	February 27, 2022
all CUDA-capable devices are busy or unavailable. What is wrong? cuDNN	10	9608	October 12, 2021
Tensorflow docker can't detect gpu Docker and NVIDIA Docker cuda , tensorflow , docker	0	3947	December 31, 2020
Docker container cant use GPU cuDNN tensorflow , docker , python , gpu	1	4494	July 1, 2022
Tensorflow coredump no supported devices found for CUDA (Docker nvcr.io container), after reboot nvidia-smi can't find driver Linux cuda , tensorflow	2	2577	October 8, 2020
Unable to run TensorFlow with vGPU General Discussion	2	5253	March 9, 2020
CUDA 10.2 & Tensorflow 2.0. Getting an error when testing Tensorflow CUDA Setup and Installation	7	20928	March 20, 2020
Tensorflow 2.1 with CUDA10.2 warnings .. Frameworks tensorflow	15	17755	July 3, 2020
Tensorflow fails to find libcudart CUDA on Windows Subsystem for Linux	7	18651	September 23, 2020
Installation on WSL2/Windows 11 problem - can't see GPU CUDA on Windows Subsystem for Linux	11	20372	January 15, 2025

Running docker-compose failing in GPU detection

Related topics