As discussed in this thread, NVIDIA doesn’t include the tensorflow C libs, so we have to build it ourselves from the source.
For Jetpack 4.6.1, the compatibility table says tensorflow version 2.7.0. Can I directly take the open source tensorflow 2.7.0 to build, or is there a special nvidia patched 2.7.0 that I should have? If former, since open source tensorflow recently released 2.7.1 with many security fixes, will that work even though it is not mentioned in the compatibility table?
I tried building 2.7.0 C libs, but it failed in the end:
bazel --host_jvm_args=-Xmx1g --host_jvm_args=-Xms512m \
build -c opt \
--config=tensorrt --config=cuda \
--config=noaws --config=nogcp --config=nohdfs --config=nonccl \
--python_path="/usr/bin/python3" \
--action_env=PYTHON_BIN_PATH="/usr/bin/python3" \
--action_env=PYTHON_LIB_PATH="/usr/lib/python3/dist-packages" \
--action_env=CUDA_TOOLKIT_PATH="/usr/local/cuda-10.2" \
--action_env=TF_CUDA_COMPUTE_CAPABILITIES="7.2" \
--action_env=GCC_HOST_COMPILER_PATH="/usr/bin/aarch64-linux-gnu-gcc-7" \
//tensorflow/tools/lib_package:libtensorflow
...
...
tensorflow/compiler/tf2tensorrt/stub/nvinfer_plugin_stub.cc:64:2: error: #error
This version of TensorRT is not supported.
#error This version of TensorRT is not supported.
^~~~~
Target //tensorflow/tools/lib_package:libtensorflow failed to build
Use --verbose_failures to see the command lines of failed build steps.
Our source is similar to the upstreaming with some compatibility fixes.
Based on the source below, TensorFlow only supports up to the TensorRT 8.0.
Maybe you can try a newer TensorFlow version or turn off the TensorRT support.
I tried compiling TF 2.8.0 with JP 4.6.1, using Python 2.8 but got below errors:
...
ERROR: /var/log/tensorflow/tensorflow-2.8.0/tensorflow/python/lib/core/BUILD
:49:11: Compiling tensorflow/python/lib/core/bfloat16.cc failed: (Exit 1): cross
tool_wrapper_driver_is_not_gcc failed: error executing command external/local_co
nfig_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -MD -MF bazel-
out/aarch64-opt/bin/tensorflow/python/lib/core/_objs/bfloat16_lib/bfloat16.pic.d
... (remaining 63 argument(s) skipped)
In file included from /usr/include/c++/7/math.h:36:0,
from bazel-out/aarch64-opt/bin/external/local_config_python/pyt
hon_include/pyport.h:194,
from bazel-out/aarch64-opt/bin/external/local_config_python/pyt
hon_include/Python.h:53,
from ./tensorflow/python/lib/core/bfloat16.h:19,
from tensorflow/python/lib/core/bfloat16.cc:16:
/usr/include/c++/7/cmath: In static member function 'static void tensorflow::{an
onymous}::BinaryUFunc<InType, OutType, Functor>::Call(char**, const npy_intp*, c
onst npy_intp*, void*) [with InType = Eigen::bfloat16; OutType = Eigen::bfloat16
; Functor = tensorflow::{anonymous}::ufuncs::CopySign]':
/usr/include/c++/7/cmath:1302:40: internal compiler error: Segmentation fault
{ return __builtin_copysignf(__x, __y); }
^
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-7/README.Bugs> for instructions.
Target //tensorflow/tools/pip_package:build_pip_package failed to build
Use --verbose_failures to see the command lines of failed build steps.
...
I haven’t compiled TensorFlow myself before, and don’t have patches for this, but typically ICE errors are a sign that you may want/need to upgrade the compiler version. Or you can manually try patching the source to avoid calling that macro.
It would seem that’s what that patch indeed does, so you could try it and see if it works (as previously mentioned, I have only done this with PyTorch and not TensorFlow)
As I mentioned above, “copysign” was added in tensorflow 2.5+, and since JP 4.6.1 supports tensorflow 2.7.0, can you check your internal codebase to see how you patched it to have 2.7.0 built properly? and provide the patch here? Thanks!
I was hoping to see how nvidia was able to compile tf 2.7 and released the python whls file. If that’s not feasible, if @dusty_nv can provide an unofficial patch since he was the one who gave the PyTorch patch.
For now, I am able to get it compile by doing below, but not sure if it is correct.
Yes it compiles, but I don’t have a way to do a full test. I don’t know if my application triggers that piece of code. Since your patch is provided and used by ALL jetson users, it will get more exposure and testing. Also, I assume you have your own full test suite which could catch issues.
Since the upstream tensorflow from 2.5+ won’t compile for Jetson, it makes sense for you to submit your patch to upstream, so it can benefit all Jetson users, and be managed by upstream team.
As @dusty_nv mentioned here , the error I encountered about “copysign” was due to the compiler version that is bundled with JP 4.6.1 (ubuntu 18). JP 5.0 has a newer version of compiler, so you will NOT see the error.
We have checked this with our internal team.
They didn’t meet the copysign error when compiling the TensorFlow 2.7 on JetPack 4.6.1.
So would you mind using the TensorFlow 2.7 instead?
For the TensorRT compatibility issue, you can get around it with a similar patch as shared above.