issue installing Tensorflow

Hello

I am following this tutorial from Jetsonhacks to install Tensorflow on my TX2 board: http://www.jetsonhacks.com/2017/04/02/tensorflow-on-nvidia-jetson-tx2-development-kit/

The situation:

  • I haven’t set up any swap memory
  • normally CUDNN and CUDA should be properly installed, I installed them remotely via Jetpack
  • df -h returns:
$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/mmcblk0p1   28G   19G  7.6G  71% /
none            7.0G     0  7.0G   0% /dev
tmpfs           7.7G  264K  7.7G   1% /dev/shm
tmpfs           7.7G   14M  7.7G   1% /run
tmpfs           5.0M  4.0K  5.0M   1% /run/lock
tmpfs           7.7G     0  7.7G   0% /sys/fs/cgroup
tmpfs           786M   72K  786M   1% /run/user/1001

The issue:

When running the script provided by jetsonhacks: $ ./buildTensorFlow.sh
I get this error message:

ERROR: /home/nvidia/tensorflow/tensorflow/core/kernels/BUILD:2183:1: C++ compilation of rule '//tensorflow/core/kernels:svd_op' failed: crosstool_wrapper_driver_is_not_gcc failed: error executing command 
  (cd /home/nvidia/.cache/bazel/_bazel_nvidia/d2751a49dacf4cb14a513ec663770624/execroot/org_tensorflow && \
  exec env - \
    CUDA_TOOLKIT_PATH=/usr/local/cuda \
    CUDNN_INSTALL_PATH=/usr/lib/aarch64-linux-gnu \
    GCC_HOST_COMPILER_PATH=/usr/bin/gcc \
    LD_LIBRARY_PATH=/home/nvidia/torch/install/lib:/home/nvidia/torch/install/lib: \
    PATH=/home/nvidia/torch/install/bin:/home/nvidia/torch/install/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin \
    PWD=/proc/self/cwd \
    PYTHON_BIN_PATH=/usr/bin/python \
    PYTHON_LIB_PATH=/usr/local/lib/python2.7/dist-packages \
    TF_CUDA_CLANG=0 \
    TF_CUDA_COMPUTE_CAPABILITIES=6.2 \
    TF_CUDA_VERSION=8.0 \
    TF_CUDNN_VERSION=5.1.10 \
    TF_NEED_CUDA=1 \
    TF_NEED_OPENCL=0 \
  external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -fPIE -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 -DNDEBUG -ffunction-sections -fdata-sections '-std=c++11' -MD -MF bazel-out/local_linux-opt/bin/tensorflow/core/kernels/_objs/svd_op/tensorflow/core/kernels/svd_op_complex64.pic.d '-frandom-seed=bazel-out/local_linux-opt/bin/tensorflow/core/kernels/_objs/svd_op/tensorflow/core/kernels/svd_op_complex64.pic.o' -fPIC -DEIGEN_MPL2_ONLY -DSNAPPY -iquote . -iquote bazel-out/local_linux-opt/genfiles -iquote external/bazel_tools -iquote bazel-out/local_linux-opt/genfiles/external/bazel_tools -iquote external/protobuf -iquote bazel-out/local_linux-opt/genfiles/external/protobuf -iquote external/eigen_archive -iquote bazel-out/local_linux-opt/genfiles/external/eigen_archive -iquote external/local_config_sycl -iquote bazel-out/local_linux-opt/genfiles/external/local_config_sycl -iquote external/gif_archive -iquote bazel-out/local_linux-opt/genfiles/external/gif_archive -iquote external/jpeg -iquote bazel-out/local_linux-opt/genfiles/external/jpeg -iquote external/com_googlesource_code_re2 -iquote bazel-out/local_linux-opt/genfiles/external/com_googlesource_code_re2 -iquote external/farmhash_archive -iquote bazel-out/local_linux-opt/genfiles/external/farmhash_archive -iquote external/fft2d -iquote bazel-out/local_linux-opt/genfiles/external/fft2d -iquote external/highwayhash -iquote bazel-out/local_linux-opt/genfiles/external/highwayhash -iquote external/png_archive -iquote bazel-out/local_linux-opt/genfiles/external/png_archive -iquote external/zlib_archive -iquote bazel-out/local_linux-opt/genfiles/external/zlib_archive -iquote external/snappy -iquote bazel-out/local_linux-opt/genfiles/external/snappy -iquote external/local_config_cuda -iquote bazel-out/local_linux-opt/genfiles/external/local_config_cuda -isystem external/bazel_tools/tools/cpp/gcc3 -isystem external/protobuf/src -isystem bazel-out/local_linux-opt/genfiles/external/protobuf/src -isystem external/eigen_archive -isystem bazel-out/local_linux-opt/genfiles/external/eigen_archive -isystem external/gif_archive/lib -isystem bazel-out/local_linux-opt/genfiles/external/gif_archive/lib -isystem external/farmhash_archive/src -isystem bazel-out/local_linux-opt/genfiles/external/farmhash_archive/src -isystem external/png_archive -isystem bazel-out/local_linux-opt/genfiles/external/png_archive -isystem external/zlib_archive -isystem bazel-out/local_linux-opt/genfiles/external/zlib_archive -isystem external/local_config_cuda/cuda -isystem bazel-out/local_linux-opt/genfiles/external/local_config_cuda/cuda -isystem external/local_config_cuda/cuda/cuda/include -isystem bazel-out/local_linux-opt/genfiles/external/local_config_cuda/cuda/cuda/include -DEIGEN_AVOID_STL_ARRAY -Iexternal/gemmlowp -Wno-sign-compare -fno-exceptions '-DGOOGLE_CUDA=1' -pthread '-DGOOGLE_CUDA=1' -no-canonical-prefixes -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -fno-canonical-system-headers -c tensorflow/core/kernels/svd_op_complex64.cc -o bazel-out/local_linux-opt/bin/tensorflow/core/kernels/_objs/svd_op/tensorflow/core/kernels/svd_op_complex64.pic.o): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 4.
gcc: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-5/README.Bugs> for instructions.
Target //tensorflow/tools/pip_package:build_pip_package failed to build
INFO: Elapsed time: 5924.737s, Critical Path: 813.48s

after this error message I can’t run this script:

$ ./packageTensorFlow.sh
./packageTensorFlow.sh: line 3: cd: /home/nvidia/tensorflow: No such file or directory
./packageTensorFlow.sh: line 4: bazel-bin/tensorflow/tools/pip_package/build_pip_package: No such file or directory
mv: cannot stat '/tmp/tensorflow_pkg/tensorflow-*.whl': No such file or directory

What can I do to solve this issue so I can install Tensorflow as shown in this tutorial?

Try running $./tegrastats to monitor the memory usage while running the scripts.
From this line of the error output, it looks like the Linux OOM (Out Of Memory) killer killed the gcc compiler process:

gcc: internal compiler error: Killed (program cc1plus)

From the tutorial, it is recommended to attach swap as TF requires 6GB+ of memory to build:

Hi

I saw that they suggested using swap, unfortunately it is not possible for me to add additional memory to the device (like an SSD or a USB-stick) as I don’t have physical access to it. I also looked at what additional files I can delete from my system in order to obtain more free space, but what I described above is all I can do.

What alternative do you think there is? Is swap really a must?

I can’t say for sure if swap is a must, but usually the people here are right when it is suggested running out of space means swap is needed. “They’re probably right.”

If you don’t have physical access, then I suspect doing anything special which might give you extra space from a remote location runs the risk of breaking the system and requiring physical access to recover. I’ve seen sshfs mentioned here several times, and this seems to be your best/safest bet (I’ve never tried this myself). You could just “sudo apt-get install sshfs”, and then see how it works with “man sshfs”. Plus I see an interesting URL on it:
https://www.digitalocean.com/community/tutorials/how-to-use-sshfs-to-mount-remote-file-systems-over-ssh

One really good way to add swap which is fast performance is with iSCSI, but this is not for the “faint-of-heart”…it takes a steep learning curve and rebooting and testing. If you had to have a lot of remote disk mounting and needed performance, then this is probably a good choice. I wouldn’t recommend it for casual use.

Anyone get Tensorflow 1.3 to install on TX2 with Jet packs 3.1. I am unable to get Tensorflow to work properly. I tried building the whl from source using Jetson hacks GitHub as well as prebuilt ones mentioned in this forum but still am unable to get it to work. It builds with no issue but when I run a script I get CUDA_ERROR_LAUNCH_FAILED.

Hi,

We have tried this Tensorflow1.3 wheel. It works correctly on JetPack3.1 + TX2.

Thank you for confirming. I have tried to install the the whl package multiple times after flashing a fresh OS with the same result. Tensorflow scripts will work fine after the installation but after a reboot I will get the following error.

2017-10-13 03:20:08.205771: E tensorflow/stream_executor/cuda/cuda_driver.cc:1068] failed to synchronize the stop event: CUDA_ERROR_LAUNCH_FAILED
2017-10-13 03:20:08.205969: E tensorflow/stream_executor/cuda/cuda_timer.cc:54] Internal: error destroying CUDA event in context 0x4e4c430: CUDA_ERROR_LAUNCH_FAILED
2017-10-13 03:20:08.206024: E tensorflow/stream_executor/cuda/cuda_timer.cc:59] Internal: error destroying CUDA event in context 0x4e4c430: CUDA_ERROR_LAUNCH_FAILED
2017-10-13 03:20:08.206266: F tensorflow/stream_executor/cuda/cuda_dnn.cc:2045] failed to enqueue convolution on stream: CUDNN_STATUS_EXECUTION_FAILED

Hi,

Which script do you use?
We can successfully import TensorFlow after rebooting.

From the error message, this issue may come from GPU driver.
Could you share more information about your environment?
Do you use JetPack3.1 and CUDA-8.0?

Please remember that this wheel file is built with CUDA-8.0 and cuDNNv6.

Thanks.

I used the JetsonHacks Tensorflow script on Github. I think the issue is coming from the ConnectTech carrier board since the Nvidia TX2 Evaluation Dev board is running the same Tensorflow script with no issues. The only difference in software between these modules is that one module has ConnectTech BSP Package flashed to it. Both are running JetPacks 3.1.

Hi,

We are not familiar with ConnectTect carrier board.
But the most common issue is the communication between custom OS and TX2’s GPU.

Could you try to execute our vectorAdd CUDA sample to make sure the basic GPU functionality?

Thanks.