Internal error in Nvidia driver code while building r0.12 or master Tensorflow on Tegra TX1

Hello, I’m getting the following error when building Tensorflow. The error occurs on any one of several cuda compiles (.cu.cc files) in the build. A (changing) line in Eigen’s TensorBroadcast.h is always mentioned in the error. Error text below

It seems like an error in Nvidia tooling but exposed by code changes in Eigen made since earlier TF releases.

A few notes

  • Bazel 0.4.3-
  • Tensorflow r0.12 and master (essentially 1.0.0-alpha at this point) as of yesterday (2017-01-10)
  • Jetpack 2.3.1 with L4T 24.2.1 installed on TX1 dev board
  • verbose_failures enabled, but not giving me much to work with
  • Everything patched/fixed up for build as per https://github.com/tensorflow/tensorflow/issues/851 and related
  • I can build r0.11 with the same setup just fine (after fixing two additional Bazel related dependency/package issues)
  • I have setup suitable swap file, there are no out of memory errors or logs in the system related to the build
INFO: From Compiling tensorflow/core/kernels/batch_norm_op_gpu.cu.cc:
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorBroadcasting.h(271): internal error: assertion failed at: "/dvs/p4/build/sw/rel/gpu_drv/r361/r361_00/drivers/compiler/edg/EDG_4.10/src/folding.c", line 9819

1 catastrophic error detected in the compilation of "/tmp/tmpxft_0000498a_00000000-7_batch_norm_op_gpu.cu.cpp1.ii".
Compilation aborted.
Aborted
ERROR: /data/code/tensorflow/tensorflow/core/kernels/BUILD:1687:1: output 'tensorflow/core/kernels/_objs/batch_norm_op_gpu/tensorflow/core/kernels/batch_norm_op_gpu.cu.pic.o' was not created.
ERROR: /data/code/tensorflow/tensorflow/core/kernels/BUILD:1687:1: not all outputs were created or valid.
Target //tensorflow/tools/pip_package:build_pip_package failed to build
INFO: Elapsed time: 2823.212s, Critical Path: 2770.88s

Well, I am having the exactly same problem. It is even in the same file with the same board. I hope someone knows how to fix this. =/

So, I got 1.0.0-alpha working (kind of). I hacked workspace.bzl to use the same revision of Eigen as r0.11. Since then Tensorflow had an expm1 op added that relied on some new changes to Eigen. I sliced and diced to remove just that op and got a build up and running.

I haven’t tested exhaustively but the Resnet-50 model I was working with is up and running at around 15fps at 128x96 producing same results as GTX1080. Nice bonus, this model wasn’t even fitting in memory with my r0.11 build, seems that some memory consumption optimizations were made…

This supports the hunch I had that it’s an issue with the CUDA compiler crashing on some changes to Eigen.

I can push a branch to github with additional code hacks and binary wheel if interested.

Ross

Would you mind to push the code into Github?

I have been trying to install Tensorflow for at least one week without success. I saw some people obtained success with older versions of Tensorflow, but I was trying to install the most updated version of it. =/

Thank you very much.

https://github.com/rwightman/tensorflow/commits/r1.0-tegra-ugly_hack

I’ll add the python 2 & 3 wheel files as a ‘release’ in the next day or two.

Hi both,

Thanks for your question.
We are investigating this issue now, will update information later.

Thanks AastaLLL. Looking forward to having a clean build.

For anyone wanting a quick start on the latest Tensorflow without building themselves, I’ve posted some wheels in a github release. I imagine they should work for anyone else running equivalent Jetpack 2.3.1/L4T 24.2.1 on their X1 boards. I won’t be supporting any issues but feel free to let me know if you find any or find it useful.

https://github.com/rwightman/tensorflow/releases/tag/v1.0.0-alpha-tegra-ugly_hack

Thank you, rwightman.

I tested your PIP package and it worked great. I will do some more advanced tests and let you know if it is ok.

Hi,

This issue solved with our internal nvcc compiler, which is not yet available in JetPack.

We will help to compile tensorflow r0.12 and release binary to forum as a temporal solution.
Please wait for our update.

Thanks.

Hi rwightman,

We try to compile tensorflow r0.12 with cuda-8.0.62(not available yet).
But hit segmentation fault in ‘tensorflow/core/kernels/cwise_op_gpu_div.cu.cc’.

I guess there is still some files needed to be modified for tx1.
Since we only applied the difference described in https://github.com/tensorflow/tensorflow/issues/851

Do you have any idea about this?

Hi AastaLLL,

We are also very interested on this solution for compiling Tensorflow r0.12 on TX1.
Do you have any update on when the temporary solution will be released here in the forum?

As far as I could understand from the previous messages, we are depending on a solution similar to the hack made by rwightman in r1.0. Is that correct?

I was able to install the hacky r1.0 version, but for our purposes we really need the r0.12.

Best regards.

Hello AastaLLL,

When I decided to work on a hack fix I moved fomr r0.12 to TF 1.0-alpha as I was planning to make that transition and upgrade my models anyways. I did not try doing a hack for r0.12. Now that TF 1.0 final is out I’ll try my hack with that soon.

Does the inability to compile Tensorflow for TX1 not warrant debugging the segmentation fault in the compiler?

Ross

Hi,

We have solved our this compiler issue and will pubic it in the next time release.
Please wait for our cuda release to build your own tensorflow.

Thanks.

I know you can’t give a date, but do you have any ballpark timeline for that release? Are we looking at weeks, months, a quarter?

Ross

There is no clear timeline, but not too long.

Thanks

Will this be fixed in new Jetpack 3.0?

Thanks

Hi,

Thanks for your response.
Fix contains in CUDA version higher than 8.0.62.

CUDA version of JetPack3.0 is:
CUDA 8.0 (8.0.64) Toolkit for Ubuntu 14.04 x86 64-bit with TX2 cross-development support
CUDA 8.0 (8.0.64) Toolkit for L4T r27.1
CUDA 8.0 (8.0.34) Toolkit for Ubuntu 14.04 x86 64-bit with TX1 cross-development support
CUDA 8.0 (8.0.34) Toolkit for L4T r24.2.1
CUDA 6.5 (6.5.53) Toolkit for Ubuntu 14.04 x86 64-bit with TK1 cross-development support
CUDA 6.5 (6.5.53) Toolkit for L4T r21.5

Information can be found at: https://developer.nvidia.com/embedded/jetpack-notes

Hey AastaLLL,

So this means that fix isn’t available for TX1? :’(
Is there anything I can do to get CUDA 8.0.62 on TX1?

Hi,

Just tried to install CUDA8.0.64 (which can be downloaded by Jetpack3.0 TX2 section) on TX1.
But found cuda can’t run normally due to some insufficient driver issue.

So the fix of tensorflow r0.12 is only available on TX2 currently.
We are really sorry about this.