Running docker-compose failing in GPU detection

Hi,

I was using a recent guide to install Nvidia docker under WSL2 (Guide to run CUDA + WSL + Docker with latest versions (21382 Windows build + 470.14 Nvidia)) and it is working perfectly when I’m using docker run. Unfortunately when I’m trying to do the same via docker-compose GPU is not detected. With the newest os and drivers (now I use Windows build 21390 and 470-76 driver) problem still exists.

Working example with docker run:

➜ docker run --gpus all -it --rm  nvcr.io/nvidia/tensorflow:20.10-tf1-py3 python -c "import tensorflow as tf; tf.test.is_gpu_available()"

================
== TensorFlow ==
================

NVIDIA Release 20.10-tf1 (build 16775850)
TensorFlow Version 1.15.4

Container image Copyright (c) 2020, NVIDIA CORPORATION.  All rights reserved.
Copyright 2017-2020 The TensorFlow Authors.  All rights reserved.

NVIDIA Deep Learning Profiler (dlprof) Copyright (c) 2020, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying project or file.

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use 'nvidia-docker run' to start this container; see
   https://github.com/NVIDIA/nvidia-docker/wiki/nvidia-docker .

NOTE: MOFED driver for multi-node communication was not detected.
      Multi-node communication performance may be reduced.

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for TensorFlow.  NVIDIA recommends the use of the following flags:
   nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...

2021-06-07 11:49:57.381428: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
2021-06-07 11:49:58.673081: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3293805000 Hz
2021-06-07 11:49:58.674672: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x41d4a90 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-06-07 11:49:58.674712: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-06-07 11:49:58.677302: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-06-07 11:49:59.154081: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1072] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-06-07 11:49:59.197264: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4274910 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-06-07 11:49:59.197326: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): NVIDIA GeForce RTX 3080 Laptop GPU, Compute Capability 8.6
2021-06-07 11:49:59.205416: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1072] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-06-07 11:49:59.206496: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1665] Found device 0 with properties:
name: NVIDIA GeForce RTX 3080 Laptop GPU major: 8 minor: 6 memoryClockRate(GHz): 1.545
pciBusID: 0000:01:00.0
2021-06-07 11:49:59.207679: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-06-07 11:49:59.209889: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-06-07 11:49:59.210801: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-06-07 11:49:59.225675: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-06-07 11:49:59.241940: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11
2021-06-07 11:49:59.244261: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-06-07 11:49:59.244466: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8

Example docker-compose:
version: “3.8”

services:
  tf:
    image: nvcr.io/nvidia/tensorflow:20.10-tf1-py3
    command: python -c "import tensorflow as tf; tf.test.is_gpu_available()"

Output:

➜ docker-compose -f docker-compose-tf.yml up
Starting test_tf_1 ... done
Attaching to test_tf_1
tf_1  |
tf_1  | ================
tf_1  | == TensorFlow ==
tf_1  | ================
tf_1  |
tf_1  | NVIDIA Release 20.10-tf1 (build 16775850)
tf_1  | TensorFlow Version 1.15.4
tf_1  |
tf_1  | Container image Copyright (c) 2020, NVIDIA CORPORATION.  All rights reserved.
tf_1  | Copyright 2017-2020 The TensorFlow Authors.  All rights reserved.
tf_1  |
tf_1  | NVIDIA Deep Learning Profiler (dlprof) Copyright (c) 2020, NVIDIA CORPORATION.  All rights reserved.
tf_1  |
tf_1  | Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
tf_1  | NVIDIA modifications are covered by the license terms that apply to the underlying project or file.
tf_1  |
tf_1  | WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
tf_1  |    Use 'nvidia-docker run' to start this container; see
tf_1  |    https://github.com/NVIDIA/nvidia-docker/wiki/nvidia-docker .
tf_1  |
tf_1  | NOTE: MOFED driver for multi-node communication was not detected.
tf_1  |       Multi-node communication performance may be reduced.
tf_1  |
tf_1  | NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
tf_1  |    insufficient for TensorFlow.  NVIDIA recommends the use of the following flags:
tf_1  |    nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...
tf_1  |
tf_1  | 2021-06-07 11:50:09.934550: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
tf_1  | WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
tf_1  | 2021-06-07 11:50:11.119166: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3293805000 Hz
tf_1  | 2021-06-07 11:50:11.120712: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5c17250 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
tf_1  | 2021-06-07 11:50:11.120741: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
tf_1  | 2021-06-07 11:50:11.121963: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
tf_1  | 2021-06-07 11:50:11.121995: E tensorflow/stream_executor/cuda/cuda_driver.cc:339] failed call to cuInit: UNKNOWN ERROR (303)
tf_1  | 2021-06-07 11:50:11.122007: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (8d40e9c8c6eb): /proc/driver/nvidia/version does not exist
test_tf_1 exited with code 0

Is it possible somehow to use docker-compose with GPU support under WSL2?

You are missing the device capabilities. Try with this docker-compose.yml file:

services:
  test:
    image: tensorflow/tensorflow:latest-gpu
    command: python -c "import tensorflow as tf;tf.test.gpu_device_name()"
    deploy:
      resources:
        reservations:
          devices:
          - capabilities: [gpu]

Thanks! Now it works like a charm! I’m a little bit surprised because under bare ubuntu it is not required

1 Like