TensorFlow wheel for JetPack 4.0 !!

TensorFlow for JetPack4.0 is updated !!

Python 2.7:
r1.10.1: Box

Python 3.6:
r1.10.1: Box

Thanks. :)

Is the DLA supported in this release?

Hi mechadeck, it is supported in JetPack 4.0, but I don’t believe that TensorFlow has performed the DLA integration.

Confirmed and installed last night for Python 3.6. Be patient. It will take awhile to install.


thanks a lot. It worked great on Xavier!

Here is a first tensor test.

nvidia@jetson-0423418009922:/opt/ssd500/installation$ python3 testTensorflow.py
2018-09-22 19:23:40.420202: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:864] ARM64 does not support NUMA - returning NUMA node zero
2018-09-22 19:23:40.420432: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties:
name: Xavier major: 7 minor: 2 memoryClockRate(GHz): 1.5
pciBusID: 0000:00:00.0
totalMemory: 15.46GiB freeMemory: 11.19GiB
2018-09-22 19:23:40.420552: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0
2018-09-22 19:23:41.254301: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-09-22 19:23:41.254481: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0
2018-09-22 19:23:41.254525: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N
2018-09-22 19:23:41.254826: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4748 MB memory) → physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2)

Ran 2 tests in 1.945s

I am noticing an error when training. I am using a VGGNet model implemented in Keras training the CIFAR-10 dataset. I’m training for 40 epochs. It’s about 50/50 whether the training completes. I haven’t isolated this yet to v1.10 of Tensorflow. The error I am getting is: tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with ‘XX’ values, but the requested shape has ‘XX’.

The ‘XX’ differs each time. At times the training completes fine without error. I don’t have a reproducible scenario. The complete output is:

File “SB15.03_vggnet_mini_cifar10.py”, line 63, in
H = model.fit(trainX, trainY, validation_data=(testX, testY), batch_size=64, epochs=40, verbose=1)
File “/home/nvidia/.virtualenvs/dl4cv/lib/python3.6/site-packages/keras/engine/training.py”, line 1037, in fit
File “/home/nvidia/.virtualenvs/dl4cv/lib/python3.6/site-packages/keras/engine/training_arrays.py”, line 199, in fit_loop
outs = f(ins_batch)
File “/home/nvidia/.virtualenvs/dl4cv/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py”, line 2666, in call
return self._call(inputs)
File “/home/nvidia/.virtualenvs/dl4cv/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py”, line 2636, in _call
fetched = self._callable_fn(*array_vals)
File “/home/nvidia/.virtualenvs/dl4cv/lib/python3.6/site-packages/tensorflow/python/client/session.py”, line 1382, in call
File “/home/nvidia/.virtualenvs/dl4cv/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py”, line 519, in exit
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 64 values, but the requested shape has 978017632868082584
[[Node: training/SGD/gradients/loss/activation_6_loss/Sum_1_grad/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _class=[“loc:@training/SGD/gradients/loss/activation_6_loss/Sum_1_grad/Tile”], _device=“/job:localhost/replica:0/task:0/device:GPU:0”](training/SGD/gradients/loss/activation_6_loss/Neg_grad/Neg, training/SGD/gradients/loss/activation_6_loss/Sum_1_grad/DynamicStitch/_83)]]

Can you try version TF 1.6?

Maybe :)

I use the command line:

pip install --extra-index-url=https://developer.download.nvidia.com/compute/redist/jp40 tensorflow-gpu

@duclink.fetel: What version of Tensorflow does that install?


Python 3.6:
r1.10.1: Box

What CUDA version did you install, and where did you download the Xavier CUDA installation?

CUDA 10. It is part of JetPack 4.0


Got it now. The Xavier came pre-installed with an OS, so I assume we can install packages on the device (instead of from host) this time, but it has to be installed from host again like TX2.

Odd, why NVIDIA doesn’t make the deb packages available for download and install on the device (without host).
cuda-repo-l4t-10-0-local-10.0.117_1.0-1_arm64.deb etc.

Hi, AerialRoboticsGuru

It looks similar to this topic.
Could you check it first?


Hi, erwin.coumans

We will keep updating our OS version.
Flash all the things from JetPack can release users from the dependency problem.


Thanks for the great work. I successfully installed tf 1.10 on Jetson Xavier, while met some problem when comparing the performance with TX2.

Both tf was installed via
pip install --extra-index-url=https://developer.download.nvidia.com/compute/redist/jp40 tensorflow-gpu for xavier
pip install --extra-index-url=https://developer.download.nvidia.com/compute/redist/jp33 tensorflow-gpu for tx2.

I tested the tensorflow models for mobilenet-ssd, via https://github.com/tensorflow/models/blob/master/research/object_detection/object_detection_tutorial.ipynb

The bad news is, when using normal GPU mode, the average FPS on TX2 is around 3.5 FPS, while on Xavier is 1.74 FPS, only the half speed of TX2.

I then tested the vgg16 classification via ‘Use pre-trained models’ in

The bad news is, when using normal GPU mode, the average FPS on TX2 is around 0.76 FPS, while on Xavier is 0.15 FPS, only the 1/5 speed of TX2.

Is there anything wrong in tf whl, or in the system? Help needed, thanks a lot!


Have you maximized the system performance via these commands?

sudo nvpmodel -m 0
sudo ./jetson_clocks.sh

More, if possible, it’s recommended to convert TensorFlow into TensorRT PLAN first.
TensorRT will optimize its implementation based on the GPU architecture.