issue installing Tensorflow

linengmiao · October 6, 2017, 9:39am

Hello

I am following this tutorial from Jetsonhacks to install Tensorflow on my TX2 board: TensorFlow on NVIDIA Jetson TX2 Development Kit - JetsonHacks

The situation:

I haven’t set up any swap memory
normally CUDNN and CUDA should be properly installed, I installed them remotely via Jetpack
df -h returns:

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/mmcblk0p1   28G   19G  7.6G  71% /
none            7.0G     0  7.0G   0% /dev
tmpfs           7.7G  264K  7.7G   1% /dev/shm
tmpfs           7.7G   14M  7.7G   1% /run
tmpfs           5.0M  4.0K  5.0M   1% /run/lock
tmpfs           7.7G     0  7.7G   0% /sys/fs/cgroup
tmpfs           786M   72K  786M   1% /run/user/1001

The issue:

When running the script provided by jetsonhacks: $ ./buildTensorFlow.sh
I get this error message:

ERROR: /home/nvidia/tensorflow/tensorflow/core/kernels/BUILD:2183:1: C++ compilation of rule '//tensorflow/core/kernels:svd_op' failed: crosstool_wrapper_driver_is_not_gcc failed: error executing command 
  (cd /home/nvidia/.cache/bazel/_bazel_nvidia/d2751a49dacf4cb14a513ec663770624/execroot/org_tensorflow && \
  exec env - \
    CUDA_TOOLKIT_PATH=/usr/local/cuda \
    CUDNN_INSTALL_PATH=/usr/lib/aarch64-linux-gnu \
    GCC_HOST_COMPILER_PATH=/usr/bin/gcc \
    LD_LIBRARY_PATH=/home/nvidia/torch/install/lib:/home/nvidia/torch/install/lib: \
    PATH=/home/nvidia/torch/install/bin:/home/nvidia/torch/install/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin \
    PWD=/proc/self/cwd \
    PYTHON_BIN_PATH=/usr/bin/python \
    PYTHON_LIB_PATH=/usr/local/lib/python2.7/dist-packages \
    TF_CUDA_CLANG=0 \
    TF_CUDA_COMPUTE_CAPABILITIES=6.2 \
    TF_CUDA_VERSION=8.0 \
    TF_CUDNN_VERSION=5.1.10 \
    TF_NEED_CUDA=1 \
    TF_NEED_OPENCL=0 \
  external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -fPIE -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 -DNDEBUG -ffunction-sections -fdata-sections '-std=c++11' -MD -MF bazel-out/local_linux-opt/bin/tensorflow/core/kernels/_objs/svd_op/tensorflow/core/kernels/svd_op_complex64.pic.d '-frandom-seed=bazel-out/local_linux-opt/bin/tensorflow/core/kernels/_objs/svd_op/tensorflow/core/kernels/svd_op_complex64.pic.o' -fPIC -DEIGEN_MPL2_ONLY -DSNAPPY -iquote . -iquote bazel-out/local_linux-opt/genfiles -iquote external/bazel_tools -iquote bazel-out/local_linux-opt/genfiles/external/bazel_tools -iquote external/protobuf -iquote bazel-out/local_linux-opt/genfiles/external/protobuf -iquote external/eigen_archive -iquote bazel-out/local_linux-opt/genfiles/external/eigen_archive -iquote external/local_config_sycl -iquote bazel-out/local_linux-opt/genfiles/external/local_config_sycl -iquote external/gif_archive -iquote bazel-out/local_linux-opt/genfiles/external/gif_archive -iquote external/jpeg -iquote bazel-out/local_linux-opt/genfiles/external/jpeg -iquote external/com_googlesource_code_re2 -iquote bazel-out/local_linux-opt/genfiles/external/com_googlesource_code_re2 -iquote external/farmhash_archive -iquote bazel-out/local_linux-opt/genfiles/external/farmhash_archive -iquote external/fft2d -iquote bazel-out/local_linux-opt/genfiles/external/fft2d -iquote external/highwayhash -iquote bazel-out/local_linux-opt/genfiles/external/highwayhash -iquote external/png_archive -iquote bazel-out/local_linux-opt/genfiles/external/png_archive -iquote external/zlib_archive -iquote bazel-out/local_linux-opt/genfiles/external/zlib_archive -iquote external/snappy -iquote bazel-out/local_linux-opt/genfiles/external/snappy -iquote external/local_config_cuda -iquote bazel-out/local_linux-opt/genfiles/external/local_config_cuda -isystem external/bazel_tools/tools/cpp/gcc3 -isystem external/protobuf/src -isystem bazel-out/local_linux-opt/genfiles/external/protobuf/src -isystem external/eigen_archive -isystem bazel-out/local_linux-opt/genfiles/external/eigen_archive -isystem external/gif_archive/lib -isystem bazel-out/local_linux-opt/genfiles/external/gif_archive/lib -isystem external/farmhash_archive/src -isystem bazel-out/local_linux-opt/genfiles/external/farmhash_archive/src -isystem external/png_archive -isystem bazel-out/local_linux-opt/genfiles/external/png_archive -isystem external/zlib_archive -isystem bazel-out/local_linux-opt/genfiles/external/zlib_archive -isystem external/local_config_cuda/cuda -isystem bazel-out/local_linux-opt/genfiles/external/local_config_cuda/cuda -isystem external/local_config_cuda/cuda/cuda/include -isystem bazel-out/local_linux-opt/genfiles/external/local_config_cuda/cuda/cuda/include -DEIGEN_AVOID_STL_ARRAY -Iexternal/gemmlowp -Wno-sign-compare -fno-exceptions '-DGOOGLE_CUDA=1' -pthread '-DGOOGLE_CUDA=1' -no-canonical-prefixes -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -fno-canonical-system-headers -c tensorflow/core/kernels/svd_op_complex64.cc -o bazel-out/local_linux-opt/bin/tensorflow/core/kernels/_objs/svd_op/tensorflow/core/kernels/svd_op_complex64.pic.o): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 4.
gcc: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-5/README.Bugs> for instructions.
Target //tensorflow/tools/pip_package:build_pip_package failed to build
INFO: Elapsed time: 5924.737s, Critical Path: 813.48s

after this error message I can’t run this script:

$ ./packageTensorFlow.sh
./packageTensorFlow.sh: line 3: cd: /home/nvidia/tensorflow: No such file or directory
./packageTensorFlow.sh: line 4: bazel-bin/tensorflow/tools/pip_package/build_pip_package: No such file or directory
mv: cannot stat '/tmp/tensorflow_pkg/tensorflow-*.whl': No such file or directory

What can I do to solve this issue so I can install Tensorflow as shown in this tutorial?

dusty_nv · October 6, 2017, 2:49pm

Try running $./tegrastats to monitor the memory usage while running the scripts.
From this line of the error output, it looks like the Linux OOM (Out Of Memory) killer killed the gcc compiler process:

gcc: internal compiler error: Killed (program cc1plus)

From the tutorial, it is recommended to attach swap as TF requires 6GB+ of memory to build:

linengmiao · October 6, 2017, 10:59pm

dusty_nv:

Try running $./tegrastats to monitor the memory usage while running the scripts.
From this line of the error output, it looks like the Linux OOM (Out Of Memory) killer killed the gcc compiler process:
gcc: internal compiler error: Killed (program cc1plus)
From the tutorial, it is recommended to attach swap as TF requires 6GB+ of memory to build:

In order to get TensorFlow to compile on the Jetson TX2, a swap file is needed for virtual memory. Also, a good amount of disk space ( > 6 GB ) is needed to actually build the program. If you’re unfamiliar with how to set the Jetson TX2 up like that, the procedure is similar to that as described in the article: Jetson TX1 Swap File and Development Preparation.

Hi

I saw that they suggested using swap, unfortunately it is not possible for me to add additional memory to the device (like an SSD or a USB-stick) as I don’t have physical access to it. I also looked at what additional files I can delete from my system in order to obtain more free space, but what I described above is all I can do.

What alternative do you think there is? Is swap really a must?

linuxdev · October 6, 2017, 11:28pm

I can’t say for sure if swap is a must, but usually the people here are right when it is suggested running out of space means swap is needed. “They’re probably right.”

If you don’t have physical access, then I suspect doing anything special which might give you extra space from a remote location runs the risk of breaking the system and requiring physical access to recover. I’ve seen sshfs mentioned here several times, and this seems to be your best/safest bet (I’ve never tried this myself). You could just “sudo apt-get install sshfs”, and then see how it works with “man sshfs”. Plus I see an interesting URL on it:
[url]https://www.digitalocean.com/community/tutorials/how-to-use-sshfs-to-mount-remote-file-systems-over-ssh[/url]

One really good way to add swap which is fast performance is with iSCSI, but this is not for the “faint-of-heart”…it takes a steep learning curve and rebooting and testing. If you had to have a lot of remote disk mounting and needed performance, then this is probably a good choice. I wouldn’t recommend it for casual use.

waqas_41ac75k · October 12, 2017, 2:27am

Anyone get Tensorflow 1.3 to install on TX2 with Jet packs 3.1. I am unable to get Tensorflow to work properly. I tried building the whl from source using Jetson hacks GitHub as well as prebuilt ones mentioned in this forum but still am unable to get it to work. It builds with no issue but when I run a script I get CUDA_ERROR_LAUNCH_FAILED.

AastaLLL · October 12, 2017, 8:03am

Hi,

We have tried this Tensorflow1.3 wheel. It works correctly on JetPack3.1 + TX2.

waqas_41ac75k · October 13, 2017, 3:30am

Thank you for confirming. I have tried to install the the whl package multiple times after flashing a fresh OS with the same result. Tensorflow scripts will work fine after the installation but after a reboot I will get the following error.

2017-10-13 03:20:08.205771: E tensorflow/stream_executor/cuda/cuda_driver.cc:1068] failed to synchronize the stop event: CUDA_ERROR_LAUNCH_FAILED
2017-10-13 03:20:08.205969: E tensorflow/stream_executor/cuda/cuda_timer.cc:54] Internal: error destroying CUDA event in context 0x4e4c430: CUDA_ERROR_LAUNCH_FAILED
2017-10-13 03:20:08.206024: E tensorflow/stream_executor/cuda/cuda_timer.cc:59] Internal: error destroying CUDA event in context 0x4e4c430: CUDA_ERROR_LAUNCH_FAILED
2017-10-13 03:20:08.206266: F tensorflow/stream_executor/cuda/cuda_dnn.cc:2045] failed to enqueue convolution on stream: CUDNN_STATUS_EXECUTION_FAILED

AastaLLL · October 16, 2017, 8:11am

Hi,

Which script do you use?
We can successfully import TensorFlow after rebooting.

From the error message, this issue may come from GPU driver.
Could you share more information about your environment?
Do you use JetPack3.1 and CUDA-8.0?

Please remember that this wheel file is built with CUDA-8.0 and cuDNNv6.

Thanks.

waqas_41ac75k · October 16, 2017, 10:27pm

I used the JetsonHacks Tensorflow script on Github. I think the issue is coming from the ConnectTech carrier board since the Nvidia TX2 Evaluation Dev board is running the same Tensorflow script with no issues. The only difference in software between these modules is that one module has ConnectTech BSP Package flashed to it. Both are running JetPacks 3.1.

AastaLLL · October 17, 2017, 8:09am

Hi,

We are not familiar with ConnectTect carrier board.
But the most common issue is the communication between custom OS and TX2’s GPU.

Could you try to execute our vectorAdd CUDA sample to make sure the basic GPU functionality?

Thanks.

Topic		Replies	Views
TensorFlow on Jetson TX2 Jetson TX2	47	19686	September 18, 2017
How would I get tensorflow up and running with Jetson TX2? Jetson TX2	4	1259	October 18, 2021
TensorFlow Cats vs Dogs Jetson TX1	8	1789	October 18, 2021
run tensorflow 1.3 on tx2 stuck Jetson TX2	20	5708	October 18, 2021
installation tensorflow: cannot stat '/usr/include/cudnn.h' Jetson TX2	3	3263	October 18, 2021
Odd behavior with Jetpack 3.2 and tensorflow Jetson TX2	4	1077	October 18, 2021
Tensorflow Memory Error Jetson TX2	25	15444	October 18, 2021
CUDA Fail when running Tensorflow inference Jetson TX2	10	3425	February 2, 2018
Installing Tensorflow on Tx2 without Swap Jetson TX2	2	644	October 18, 2021
Trying to execute tensorflow with GPU support on my Jetson TX2, but having error. Jetson TX2	2	1119	October 18, 2021

issue installing Tensorflow

Related topics