Error: Installing tensorflow from source

I tried to compile tensorflow from source with this script:
https://devtalk.nvidia.com/default/topic/1044720/drive-agx/installing-tensorflow-from-source-on-drive-agx/post/5301621/#5301621

It seemed to finish without errors…

Successfully installed absl-py-0.7.0 astor-0.7.1 backports.weakref-1.0.post1 bleach-1.5.0 enum34-1.1.6 funcsigs-1.0.2 futures-3.2.0 gast-0.2.2 grpcio-1.18.0 html5lib-0.9999999 markdown-3.0.1 mock-2.0.0 numpy-1.16.1 pbr-5.1.2 protobuf-3.6.1 six-1.12.0 tensorboard-1.6.0 tensorflow-1.6.0 termcolor-1.1.0 werkzeug-0.14.1
** Install TensorFlow-1.6.0 successfully
** Bye :)

…but also without asking for GPU architecture (or any of the other stuff written in the README), and now I get the following response when I execute code where tensorflow is used:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/__init__.py", line 24, in <module>
    from tensorflow.python import *
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/__init__.py", line 49, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 74, in <module>
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory


Failed to load the native TensorFlow runtime.

I already checked google for possible fixes… The file ‘libcublas.so.9.0’ is located in my /usr/local/cuda/lib64 directory and I also appended this path to LD_LIBRARY_PATH in my .bashrc-file. My manually installed version of CUDA is 9.0 and of cuDNN it’s 7.0.5, on Ubuntu 16.04 LTS.

Do you have any further ideas? Looking forward to your help!

Dear martin.funk,
Does your system has mutiple CUDA versions installed?
Could you please locate libtensorflow_framework.so(using locate command) and check dependencies using ldd.
It should find correct libcublas.so to avoid error. Could you please check if LD_LIBRARY_PATH is set correctly?

Thanks for your reply, I figured out that I somehow installed the desktop version of CUDA (x86-64).

Now I want to try again using this patch for tf1.8:
https://devtalk.nvidia.com/default/topic/1044720/drive-agx/installing-tensorflow-from-source-on-drive-agx/post/5301836/#5301836
But I’m not quite familiar with applying this kind of patches like the one mentioned in the linked topic since I’m very new to this whole stuff…
After successfully building tf from source I assume that I have to run the patch in the directory written in the .patch-file but I can’t locate it (third_party/png.BUILD). Can someone please help me out here?

Edit the appropriate source code with the edits mentioned in the patch, for a manual way to do it. Also, the new SDK now flashes CUDA 10, if you have updated your DRIVE betwen installing from source, and importing tf, your session is trying to find cuda 9, which now doesn’t exist anymore.

Indeed there’s CUDA 10 installed on my device. I’m not able to find the old SDK containing CUDA 9 (5.0.13.0?) neither online nor in my SDK-Manager. How can I handle this issue?

Rebuild the .whl file with cuda 10, that works for tf 1.8

Now the installation fails with this error when I run: bazel build --verbose_failures --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

ERROR: /home/nvidia/Downloads/tensorflow-nvJetson/tensorflow/tensorflow/contrib/verbs/BUILD:123:1: C++ compilation of rule '//tensorflow/contrib/verbs:rdma' failed (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command 
  (cd /home/nvidia/.cache/bazel/_bazel_root/b171a386c9f55f71e74119dfaebd99fb/execroot/org_tensorflow && \
  exec env - \
    LD_LIBRARY_PATH=:/usr/local/cuda/extras/CUPTI/lib64 \
    PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \
    PWD=/proc/self/cwd \
  external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -fPIE -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 -DNDEBUG -ffunction-sections -fdata-sections -g0 -DGEMMLOWP_ALLOW_SLOW_SCALAR_FALLBACK '-march=native' '-std=c++11' -g0 -MD -MF bazel-out/host/bin/tensorflow/contrib/verbs/_objs/rdma/tensorflow/contrib/verbs/rdma.pic.d '-frandom-seed=bazel-out/host/bin/tensorflow/contrib/verbs/_objs/rdma/tensorflow/contrib/verbs/rdma.pic.o' -fPIC '-DGRPC_ARES=0' '-DPB_FIELD_16BIT=1' -D__CLANG_SUPPORT_DYN_ANNOTATION__ -DEIGEN_MPL2_ONLY -DTENSORFLOW_USE_ABSL -DTF_USE_SNAPPY -DTENSORFLOW_USE_VERBS -iquote . -iquote bazel-out/host/genfiles -iquote external/protobuf_archive -iquote bazel-out/host/genfiles/external/protobuf_archive -iquote external/bazel_tools -iquote bazel-out/host/genfiles/external/bazel_tools -iquote external/grpc -iquote bazel-out/host/genfiles/external/grpc -iquote external/zlib_archive -iquote bazel-out/host/genfiles/external/zlib_archive -iquote external/com_google_absl -iquote bazel-out/host/genfiles/external/com_google_absl -iquote external/nsync -iquote bazel-out/host/genfiles/external/nsync -iquote external/eigen_archive -iquote bazel-out/host/genfiles/external/eigen_archive -iquote external/local_config_sycl -iquote bazel-out/host/genfiles/external/local_config_sycl -iquote external/gif_archive -iquote bazel-out/host/genfiles/external/gif_archive -iquote external/jpeg -iquote bazel-out/host/genfiles/external/jpeg -iquote external/com_googlesource_code_re2 -iquote bazel-out/host/genfiles/external/com_googlesource_code_re2 -iquote external/farmhash_archive -iquote bazel-out/host/genfiles/external/farmhash_archive -iquote external/fft2d -iquote bazel-out/host/genfiles/external/fft2d -iquote external/highwayhash -iquote bazel-out/host/genfiles/external/highwayhash -iquote external/png_archive -iquote bazel-out/host/genfiles/external/png_archive -iquote external/local_config_cuda -iquote bazel-out/host/genfiles/external/local_config_cuda -isystem external/protobuf_archive/src -isystem bazel-out/host/genfiles/external/protobuf_archive/src -isystem external/bazel_tools/tools/cpp/gcc3 -isystem external/grpc/include -isystem bazel-out/host/genfiles/external/grpc/include -isystem external/zlib_archive -isystem bazel-out/host/genfiles/external/zlib_archive -isystem external/grpc/third_party/address_sorting/include -isystem bazel-out/host/genfiles/external/grpc/third_party/address_sorting/include -isystem external/nsync/public -isystem bazel-out/host/genfiles/external/nsync/public -isystem external/eigen_archive -isystem bazel-out/host/genfiles/external/eigen_archive -isystem external/gif_archive/lib -isystem bazel-out/host/genfiles/external/gif_archive/lib -isystem external/farmhash_archive/src -isystem bazel-out/host/genfiles/external/farmhash_archive/src -isystem external/png_archive -isystem bazel-out/host/genfiles/external/png_archive -isystem external/local_config_cuda/cuda -isystem bazel-out/host/genfiles/external/local_config_cuda/cuda -isystem external/local_config_cuda/cuda/cuda/include -isystem bazel-out/host/genfiles/external/local_config_cuda/cuda/cuda/include -isystem external/local_config_cuda/cuda/cuda/include/crt -isystem bazel-out/host/genfiles/external/local_config_cuda/cuda/cuda/include/crt -DEIGEN_AVOID_STL_ARRAY -Iexternal/gemmlowp -Wno-sign-compare -fno-exceptions '-ftemplate-depth=900' '-DGOOGLE_CUDA=1' -pthread '-DGOOGLE_CUDA=1' -no-canonical-prefixes -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -fno-canonical-system-headers -c tensorflow/contrib/verbs/rdma.cc -o bazel-out/host/bin/tensorflow/contrib/verbs/_objs/rdma/tensorflow/contrib/verbs/rdma.pic.o)
In file included from tensorflow/contrib/verbs/rdma.cc:21:0:
./tensorflow/contrib/verbs/rdma.h:21:30: fatal error: infiniband/verbs.h: No such file or directory
compilation terminated.
INFO: Elapsed time: 5003.292s, Critical Path: 518.92s
FAILED: Build did NOT complete successfully

I only chose the following options over the suggested ones to build:
CUDA version: 10.0
cuDNN version: 7.2.2
Compute capabilities: 7.0,7.2

Thanks in advance.

Haven’t encountered this, won’t be able to help. Mine works with the proper version of tf(1.8) and its patch

Hi martin.funk,

Have you manage to get the tf install successfully?
Please help to update the status then we can help to clarify.

Thanks

Sorry for my late answer, I had to install these packages first (thought bazel would handle this automatically…):

sudo apt-get install libibverbs-dev libjemalloc-dev
pip install enum
pip install mock

So I managed to install first tf1.8 (with bazel 0.10.0) and then - after uninstalling tf1.8 and protobuf - tf1.12 (with bazel 0.15.2), but it seems to run really slow on my DRIVE… On a laptop with tf cpu-only the following calculation takes just 0.010s:

$ python -c "import tensorflow as tf; import time; tf.enable_eager_execution(); start=time.time(); print(tf.reduce_sum(tf.random_normal([1000, 1000]))); end=time.time(); print('{:.3f}s'.format(end-start))"
2019-02-22 08:58:35.097119: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:931] ARM64 does not support NUMA - returning NUMA node zero
2019-02-22 08:58:35.097444: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
name: Xavier major: 7 minor: 2 memoryClockRate(GHz): 1.18575
pciBusID: 0000:00:00.0
totalMemory: 24.17GiB freeMemory: 18.55GiB
2019-02-22 08:58:35.097600: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-02-22 08:58:36.087521: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-02-22 08:58:36.087819: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2019-02-22 08:58:36.087956: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2019-02-22 08:58:36.088278: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 17813 MB memory) -> physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2)
tf.Tensor(489.8058, shape=(), dtype=float32)
6.593s

Any ideas how to handle this?

Hi,

May I know the TensorFlow package comes from first?
Do you build it from source with DRIVE platform?

Thanks.

I downloaded TF directly from github.
And yes, I built it from source on my DRIVE platform.

Now I’m not responsible for this project anymore, but thanks anyway!