I have been trying to run a model for the past weeks, and after hours and hours, I just can’t manage due to incompatibility of versions. Fixing this would be better for me, but I haven’t managed, so I thought I could use a docker image.
from tensorpack import DataFlow, dataflow File "/home/mk/.local/lib/python3.6/site-packages/tensorpack/__init__.py", line 5, in <module> from tensorpack.libinfo import __version__, _HAS_TF File "/home/mk/.local/lib/python3.6/site-packages/tensorpack/libinfo.py", line 47, in <module> import tensorflow as tf # noqa File "/home/mk/.local/lib/python3.6/site-packages/tensorflow/__init__.py", line 438, in <module> _ll.load_library(_main_dir) File "/home/mk/.local/lib/python3.6/site-packages/tensorflow/python/framework/load_library.py", line 154, in load_library py_tf.TF_LoadLibrary(lib) tensorflow.python.framework.errors_impl.NotFoundError: /home/mk/anaconda3/envs/pcnconda/lib/python3.6/site-packages/tensorflow/core/kernels/libtfkernel_sobol_op.so: undefined symbol: _ZN10tensorflow8OpKernel11TraceStringB5cxx11EPNS_15OpKernelContextEb
So I thought I could build a docker image (I have never used docker before) to run or train the model, which requires Tensorflow 1.12 (Actually, Tensorflow-gpu 1.12 if i’m not mistaken), with CUDA 9.0 and tested on Ubuntu 16.04 with Python 3.5 (I guess with the proper cudnn, but I read in this forum that running
--runtime nvidia should update the headers). After that I’d also run pip install requirements.txt, and train the model with my own data.
I am having a hard time understanding what kind of image, as there are a lot that refer to the same TF version (1.12). In this image for example, I can’t see anything related to cudnn, but again it may be because with
--runtime nvidia it might be enough.