If start container on ubuntu 18.10 with nvidia 415 driver (using nvidia-docker) with ‘run’ command I get:
ImportError: Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
from tensorflow.python.pywrap_tensorflow_internal import *
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
_pywrap_tensorflow_internal = swig_import_helper()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "/usr/lib/python3.5/imp.py", line 242, in load_module
return load_dynamic(name, filename, file)
File "/usr/lib/python3.5/imp.py", line 342, in load_dynamic
return _load(spec)
ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory
Logging in to container:
nvidia-docker run -it --runtime=nvidia --rm nvcr.io/nvidia/tensorflow:18.09-py3
now I find that cuda .so libraries (all exist in container in LD_LIBRARY_PATH which is set correctly) are not found:
ldd /usr/local/lib/python3.5/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.so
linux-vdso.so.1 => (0x00007ffe15136000)
libtensorflow_framework.so => /usr/local/lib/python3.5/dist-packages/tensorflow/python/../libtensorflow_framework.so (0x00007f22cd49b000)
libcublas.so.10.0 => not found
libcusolver.so.10.0 => not found
libcudart.so.10.0 => not found
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f22cd297000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f22cd07a000)
libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007f22cce58000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f22ccb4f000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f22cc947000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f22cc5c5000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f22cc3af000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f22cbfe5000)
/lib64/ld-linux-x86-64.so.2 (0x00007f22f7fd9000)
libnvToolsExt.so.1 => not found
libcublas.so.10.0 => not found
libcuda.so.1 => not found
libcudnn.so.7 => /usr/lib/x86_64-linux-gnu/libcudnn.so.7 (0x00007f22b77ec000)
libcufft.so.10.0 => not found
libcurand.so.10.0 => not found
libcudart.so.10.0 => not found
Fix is simple:
# ldconfig
# ldd /usr/local/lib/python3.5/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.so
linux-vdso.so.1 => (0x00007fffd4db4000)
libtensorflow_framework.so => /usr/local/lib/python3.5/dist-packages/tensorflow/python/../libtensorflow_framework.so (0x00007fdc20923000)
libcublas.so.10.0 => /usr/local/cuda-10.0/targets/x86_64-linux/lib/libcublas.so.10.0 (0x00007fdc1c38d000)
libcusolver.so.10.0 => /usr/local/cuda-10.0/targets/x86_64-linux/lib/libcusolver.so.10.0 (0x00007fdc13ca6000)
libcudart.so.10.0 => /usr/local/cuda-10.0/targets/x86_64-linux/lib/libcudart.so.10.0 (0x00007fdc13a2c000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fdc13828000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fdc1360b000)
libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007fdc133e9000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fdc130e0000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fdc12ed8000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fdc12b56000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fdc12940000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fdc12576000)
/lib64/ld-linux-x86-64.so.2 (0x00007fdc4b461000)
libnvToolsExt.so.1 => /usr/local/cuda-10.0/targets/x86_64-linux/lib/libnvToolsExt.so.1 (0x00007fdc1236d000)
libcuda.so.1 => /usr/lib/x86_64-linux-gnu/libcuda.so.1 (0x00007fdc1127a000)
libcudnn.so.7 => /usr/lib/x86_64-linux-gnu/libcudnn.so.7 (0x00007fdbfca81000)
libcufft.so.10.0 => /usr/local/cuda-10.0/targets/x86_64-linux/lib/libcufft.so.10.0 (0x00007fdbf65cd000)
libcurand.so.10.0 => /usr/local/cuda-10.0/targets/x86_64-linux/lib/libcurand.so.10.0 (0x00007fdbf2466000)
libnvidia-fatbinaryloader.so.415.25 => /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.415.25 (0x00007fdbf2219000)
Some libraries are resolved to my locally installed cuda-10 libcublas in /usr/local/cuda-10.0.
Now tensorflow works inside container: driver 415.25 is fine with cuda-10.0, Maxwell GPU is also supported.
How this could be fixed without running ldconfig in container? Why LD_LIBRARY_PATH not works there - is it some docker-ce bug?