Tensorflow 2.7 and Jetpack 4.6.1

As discussed in this thread, NVIDIA doesn’t include the tensorflow C libs, so we have to build it ourselves from the source.

For Jetpack 4.6.1, the compatibility table says tensorflow version 2.7.0. Can I directly take the open source tensorflow 2.7.0 to build, or is there a special nvidia patched 2.7.0 that I should have? If former, since open source tensorflow recently released 2.7.1 with many security fixes, will that work even though it is not mentioned in the compatibility table?

I tried building 2.7.0 C libs, but it failed in the end:

bazel --host_jvm_args=-Xmx1g --host_jvm_args=-Xms512m \
       build -c opt \
       --config=tensorrt --config=cuda \
       --config=noaws --config=nogcp --config=nohdfs --config=nonccl \
       --python_path="/usr/bin/python3" \
       --action_env=PYTHON_BIN_PATH="/usr/bin/python3" \
       --action_env=PYTHON_LIB_PATH="/usr/lib/python3/dist-packages" \
       --action_env=CUDA_TOOLKIT_PATH="/usr/local/cuda-10.2" \
       --action_env=TF_CUDA_COMPUTE_CAPABILITIES="7.2" \
       --action_env=GCC_HOST_COMPILER_PATH="/usr/bin/aarch64-linux-gnu-gcc-7" \
       //tensorflow/tools/lib_package:libtensorflow
...
...
tensorflow/compiler/tf2tensorrt/stub/nvinfer_plugin_stub.cc:64:2: error: #error
This version of TensorRT is not supported.
 #error This version of TensorRT is not supported.
  ^~~~~
Target //tensorflow/tools/lib_package:libtensorflow failed to build
Use --verbose_failures to see the command lines of failed build steps.

Hi,

Our source is similar to the upstreaming with some compatibility fixes.

Based on the source below, TensorFlow only supports up to the TensorRT 8.0.
Maybe you can try a newer TensorFlow version or turn off the TensorRT support.

Thanks.

Are you saying we can try upstream TF 2.8.0 even though the compatibility table says JP 4.6.1 is compatible with TF 2.7.0?

I tried compiling TF 2.8.0 with JP 4.6.1, using Python 2.8 but got below errors:

...
ERROR: /var/log/tensorflow/tensorflow-2.8.0/tensorflow/python/lib/core/BUILD
:49:11: Compiling tensorflow/python/lib/core/bfloat16.cc failed: (Exit 1): cross
tool_wrapper_driver_is_not_gcc failed: error executing command external/local_co
nfig_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -MD -MF bazel-
out/aarch64-opt/bin/tensorflow/python/lib/core/_objs/bfloat16_lib/bfloat16.pic.d
 ... (remaining 63 argument(s) skipped)
In file included from /usr/include/c++/7/math.h:36:0,
                 from bazel-out/aarch64-opt/bin/external/local_config_python/pyt
hon_include/pyport.h:194,
                 from bazel-out/aarch64-opt/bin/external/local_config_python/pyt
hon_include/Python.h:53,
                 from ./tensorflow/python/lib/core/bfloat16.h:19,
                 from tensorflow/python/lib/core/bfloat16.cc:16:
/usr/include/c++/7/cmath: In static member function 'static void tensorflow::{an
onymous}::BinaryUFunc<InType, OutType, Functor>::Call(char**, const npy_intp*, c
onst npy_intp*, void*) [with InType = Eigen::bfloat16; OutType = Eigen::bfloat16
; Functor = tensorflow::{anonymous}::ufuncs::CopySign]':
/usr/include/c++/7/cmath:1302:40: internal compiler error: Segmentation fault
   { return __builtin_copysignf(__x, __y); }
                                        ^
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-7/README.Bugs> for instructions.
Target //tensorflow/tools/pip_package:build_pip_package failed to build
Use --verbose_failures to see the command lines of failed build steps.

...

I see similar errors in this thread.
Any patches? @dusty_nv

I haven’t compiled TensorFlow myself before, and don’t have patches for this, but typically ICE errors are a sign that you may want/need to upgrade the compiler version. Or you can manually try patching the source to avoid calling that macro.

The compiler is already the latest provided by JP 4.6.1. I don’t think there is another version we could upgrade to.

I was hoping you would have a patch to bfloat16.cc that could avoid using copysign macro.

Hi,

Some users have tried to compile TensorFlow for Jetson before.
Would you mind checking the below repository first?

Thanks.

Thanks for the link, but that only talks about tensorflow 2.3 or lower, and the “copysign” was only added in tensorflow 2.5+.

I wondering if your patch could be used to by pass the compiler macro.

It would seem that’s what that patch indeed does, so you could try it and see if it works (as previously mentioned, I have only done this with PyTorch and not TensorFlow)

As I mentioned above, “copysign” was added in tensorflow 2.5+, and since JP 4.6.1 supports tensorflow 2.7.0, can you check your internal codebase to see how you patched it to have 2.7.0 built properly? and provide the patch here? Thanks!

Hi, user100090

Have you fixed this issue with the patch for PyTorch?

Thanks.

I was hoping to see how nvidia was able to compile tf 2.7 and released the python whls file. If that’s not feasible, if @dusty_nv can provide an unofficial patch since he was the one who gave the PyTorch patch.

For now, I am able to get it compile by doing below, but not sure if it is correct.

sed -i 's/std::copysign/copysign/g' \
  tensorflow/python/lib/core/bfloat16.cc
sed -i '36a static inline float copysign(float __x, float __y)\n  { return std::signbit(__y) ? -std::abs(__x) : std::abs(__x); }\n' \
  tensorflow/python/lib/core/bfloat16.cc

Hi,

So you can compile the TensorFlow without error after applying that change?
If yes, do you meet any issues when using it?

Since the branch is maintained internally, we are not able to share the patch public.
Thanks.

Yes it compiles, but I don’t have a way to do a full test. I don’t know if my application triggers that piece of code. Since your patch is provided and used by ALL jetson users, it will get more exposure and testing. Also, I assume you have your own full test suite which could catch issues.
Since the upstream tensorflow from 2.5+ won’t compile for Jetson, it makes sense for you to submit your patch to upstream, so it can benefit all Jetson users, and be managed by upstream team.

Hi,

It seems that we don’t do any specific.

Just try to build TensorFlow v2.8.0 on JetPack 5.0DP.
It can compile with some updates to allow TensorRT 8.4.

Below is the patch and script for your reference.
tf_2.8.0.patch (1.4 KB)
build_tensorflow.sh (2.1 KB)

Thanks.

As @dusty_nv mentioned here , the error I encountered about “copysign” was due to the compiler version that is bundled with JP 4.6.1 (ubuntu 18). JP 5.0 has a newer version of compiler, so you will NOT see the error.

You need to use JP 4.6.x to reproduce.

Hi,

I will check this issue with our internal team.
Thanks.

Hi,

Thanks for your patience.

We have checked this with our internal team.
They didn’t meet the copysign error when compiling the TensorFlow 2.7 on JetPack 4.6.1.
So would you mind using the TensorFlow 2.7 instead?

For the TensorRT compatibility issue, you can get around it with a similar patch as shared above.

Thanks.