Hello All,
I was struggling a lot building tensorflow on Jetson Xavier and I couldn’t find a working script which would guide through everything so I searched a lot and tried different things for days and finally was successful to build it from source. So I am going to share what I did here and hopefully it helps people who want to do the same in future. I have tried to specify all the steps I have done but I might have forgotten few things so please feel free to add anything related which improves the approach here.
System Setup
- Product: Jetson AGX Xavier
- JetPack: 4.2
- TensorFlow: 1.13
- Cuda: 10.0
- Compute Capability: 7.2
- Cudnn: 7.4
- TensorRT: 5.0.6
- Python: 3.6
- bazel: 0.19.2
- gcc used for building: 5.5.0
- pip: 1.19.1
Building bazel
- Install java if you haven’t already done so
sudo apt-get install openjdk-8-jdk
-
Download dist release 0.19.2 of bazel (bazel-0.19.2-dist.zip) from bazel’s build website
-
Unpack the downloaded file
unzip bazel-0.19.2-dist.zip
- cd to the unzipped directory and build bazel
env EXTRA_BAZEL_ARGS="--host_javabase=@local_jdk//:jdk" bash ./compile.sh.
The output should be produced in output/bazel. Feel free to add this binary to your environment, i.e. ~/.bashrc:
vim ~/.bashrc
export PATH=/pathToYourBazelDirectory/output${PATH:+:${PATH}} # add this at the end of your file
Building Tensorflow
- Download sources
git clone https://github.com/tensorflow/tensorflow.git
cd tensorflow
- Checkout tensorflow version
git checkout r1.13
- apply this patch for our arm architecture
diff --git a/tensorflow/lite/kernels/internal/BUILD b/tensorflow/lite/kernels/internal/BUILD
index 4be3226938..7226f96fdf 100644
--- a/tensorflow/lite/kernels/internal/BUILD
+++ b/tensorflow/lite/kernels/internal/BUILD
@@ -22,15 +22,12 @@ HARD_FP_FLAGS_IF_APPLICABLE = select({
NEON_FLAGS_IF_APPLICABLE = select({
":arm": [
"-O3",
- "-mfpu=neon",
],
":armeabi-v7a": [
"-O3",
- "-mfpu=neon",
],
":armv7a": [
"-O3",
- "-mfpu=neon",
],
"//conditions:default": [
"-O3",
diff --git a/third_party/aws/BUILD.bazel b/third_party/aws/BUILD.bazel
index 5426f79e46..e08f8fc108 100644
--- a/third_party/aws/BUILD.bazel
+++ b/third_party/aws/BUILD.bazel
@@ -24,7 +24,7 @@ cc_library(
"@org_tensorflow//tensorflow:raspberry_pi_armeabi": glob([
"aws-cpp-sdk-core/source/platform/linux-shared/*.cpp",
]),
- "//conditions:default": [],
+ "//conditions:default": glob(["aws-cpp-sdk-core/source/platform/linux-shared/*.cpp",]),
}) + glob([
"aws-cpp-sdk-core/include/**/*.h",
"aws-cpp-sdk-core/source/*.cpp",
diff --git a/third_party/gpus/crosstool/BUILD.tpl b/third_party/gpus/crosstool/BUILD.tpl
index db76306ffb..184cd35b87 100644
--- a/third_party/gpus/crosstool/BUILD.tpl
+++ b/third_party/gpus/crosstool/BUILD.tpl
@@ -24,6 +24,7 @@ cc_toolchain_suite(
"x64_windows|msvc-cl": ":cc-compiler-windows",
"x64_windows": ":cc-compiler-windows",
"arm": ":cc-compiler-local",
+ "aarch64": ":cc-compiler-local",
"k8": ":cc-compiler-local",
"piii": ":cc-compiler-local",
"ppc": ":cc-compiler-local",
- Install older gcc:
sudo apt-get install g++-5
sudo apt-get install gcc-5
Note: the problem with gcc is that it didn’t work for the default 7.4. It also didn’t work for 4.8 or 8. This is the only version I could finally build with.
- Create Swap
$ fallocate -l 8G swapfile
$ ls -lh swapfile
$ sudo chmod 600 swapfile
$ ls -lh swapfile
$ sudo mkswap swapfile
$ sudo swapon swapfile
$ swapon -s
- Configure system build
$./configure
Please specify the location of python. [Default is /usr/bin/python]:/usr/bin/python3
Found possible Python library paths:
/usr/local/lib/python3.6/dist-packages
/usr/lib/python3.6/dist-packages
Please input the desired Python library path to use. Default is [/usr/local/lib/python3.6/dist-packages]
Do you wish to build TensorFlow with XLA JIT support? [Y/n]: n
No XLA JIT support will be enabled for TensorFlow.
Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.
Do you wish to build TensorFlow with ROCm support? [y/N]: n
No ROCm support will be enabled for TensorFlow.
Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.
Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 10.0]:
Please specify the location where CUDA 10.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:/usr/local/cuda-10.0
Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7]:7.4
Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:/usr/lib/aarch64-linux-gnu
Do you wish to build TensorFlow with TensorRT support? [y/N]: y
TensorRT support will be enabled for TensorFlow.
Please specify the location where TensorRT is installed. [Default is /usr/lib/aarch64-linux-gnu]:
Please specify the locally installed NCCL version you want to use. [Default is to use https://github.com/nvidia/nccl]:
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.5,7.0]: 7.2
Do you want to use clang as CUDA compiler? [y/N]: n
nvcc will be used as CUDA compiler.
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:/usr/bin/gcc-5
Do you wish to build TensorFlow with MPI support? [y/N]: n
No MPI support will be enabled for TensorFlow.
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]:
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: n
Not configuring the WORKSPACE for Android builds.
Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
--config=mkl # Build with MKL support.
--config=monolithic # Config for mostly static monolithic build.
--config=gdr # Build with GDR support.
--config=verbs # Build with libverbs support.
--config=ngraph # Build with Intel nGraph support.
--config=dynamic_kernels # (Experimental) Build kernels into separate shared objects.
Preconfigured Bazel build configs to DISABLE default on features:
--config=noaws # Disable AWS S3 filesystem support.
--config=nogcp # Disable GCP support.
--config=nohdfs # Disable HDFS support.
--config=noignite # Disable Apacha Ignite support.
--config=nokafka # Disable Apache Kafka support.
--config=nonccl # Disable NVIDIA NCCL support.
Configuration finished
- Now that you have configured your system build, let start building:
bazel build --config=opt --config=nonccl //tensorflow/tools/pip_package:build_pip_package --incompatible_remove_native_http_archive=false --verbose_failures --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0"
- After the build was hopefully finished after 4.5 hours, you can now use it to build the package:
sudo bazel-bin/tensorflow/tools/pip_package/build_pip_package ../
- Install the wheel file generated
sudo pip install ../tensorflow-1.13.1-cp36-cp36m-linux_aarch64.whl
Test!
- Testing the python package
$ cd
$ python3
Python 3.6.7 (default, Oct 22 2018, 11:32:17)
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
2019-06-05 15:16:56.295371: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:965] ARM64 does not support NUMA - returning NUMA node zero
2019-06-05 15:16:56.295657: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: Xavier major: 7 minor: 2 memoryClockRate(GHz): 1.5
pciBusID: 0000:00:00.0
totalMemory: 15.33GiB freeMemory: 9.03GiB
2019-06-05 15:16:56.295785: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-06-05 15:16:57.766675: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-05 15:16:57.766865: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-06-05 15:16:57.766933: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-06-05 15:16:57.767368: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8442 MB memory) -> physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2)
Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2
2019-06-05 15:16:57.769918: I tensorflow/core/common_runtime/direct_session.cc:317] Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2
- Testing C++:
Using the example provided here, simply follow the instruction and build using the command provided there [you don’t need to run ./configure again though]. Then test your app
$ ./tensorflow/bazel-out/aarch64-opt/bin/tensorflow/cc/example/example
2019-06-05 17:49:16.518159: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:965] ARM64 does not support NUMA - returning NUMA node zero
2019-06-05 17:49:16.518515: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: Xavier major: 7 minor: 2 memoryClockRate(GHz): 1.5
pciBusID: 0000:00:00.0
totalMemory: 15.33GiB freeMemory: 7.92GiB
2019-06-05 17:49:16.518595: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-06-05 17:49:16.519415: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-05 17:49:16.519504: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-06-05 17:49:16.519589: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-06-05 17:49:16.520143: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7700 MB memory) -> physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2)
2019-06-05 17:49:18.887245: I tensorflow/cc/example/example.cc:22] 19
-3
So you get the expected results: 19 -3.
EDIT:
Please see the next post for a better approach when using C++ APIs.
Refrences
https://docs.bazel.build/versions/master/install-compile-source.html#bootstrap-bazel
https://www.tensorflow.org/install/source
https://github.com/tensorflow/tensorflow/issues/25323
https://devtalk.nvidia.com/default/topic/1049100/tensorflow-installation-on-drive-px2-/
https://devtalk.nvidia.com/default/topic/1043026/jetson-agx-xavier/building-tensorflow-whl-from-source-for-jetson-agx-solved-/
https://www.tensorflow.org/guide/extend/cc