Problem to install tensorflow on Xavier (Solved)

I installed the new released JetPack 4.0 and I was trying to install TensorFlow.
I’ve tried couple installation methods listed in TensorFlow website and none of them worked. (using pip and build from source)
pip: could not find a version that satisfies the requirement TensorFlow
build from source: 1. Bazel does not support Ubuntu18.04 so I built Bazel from source
2. TensorFlow cannot be compiled successfully
Can you provide an idea for us how to install TensorFlow on Xavier?
Thank you very much.

Deb

Hi guodebby,

Please see if GitHub - JasonAtNvidia/JetsonTFBuild: Assistance script to build TensorFlow on an NVIDIA Jetson Module helps.

I built tensorflow on Xavier.
https://github.com/naisy/JetsonXavier/tree/JetPack4.0_python2.7/JetPack4.0/python2.7/binary

It can install after joining with the cat command.

cat tensorflow-1.10.1-cp27-cp27mu-linux_aarch64.whl.part1 tensorflow-1.10.1-cp27-cp27mu-linux_aarch64.whl.part2 > tensorflow-1.10.1-cp27-cp27mu-linux_aarch64.whl
pip install tensorflow-1.10.1-cp27-cp27mu-linux_aarch64.whl

Parameters:
TF_NEED_JEMALLOC=1
TF_NEED_CUDA=1
TF_CUDA_COMPUTE_CAPABILITIES=7.2,6.2,5.3
TF_NEED_TENSORRT=1
TF_NCCL_VERSION=1

build script:
https://github.com/naisy/JetsonXavier/blob/JetPack4.0_python2.7/JetPack4.0/python2.7/scripts/build_tensorflow.sh

@naisy

Could you do one please for v1.8 Tensorflow v3.6 python please?

Hi AerialRoboticsGuru,

For python 3.6, I have only built 1.6, so I will update it.
https://github.com/naisy/JetsonXavier/tree/JetPack4.0_python3.6/JetPack4.0/python3.6/binary

I built tensorflow 1.8 with Python 3.6.
https://github.com/naisy/JetsonXavier/tree/JetPack4.0_python3.6/JetPack4.0/python3.6/binary

cat tensorflow-1.8.0-cp36-cp36m-linux_aarch64.whl.part1 tensorflow-1.8.0-cp36-cp36m-linux_aarch64.whl.part2 > tensorflow-1.8.0-cp36-cp36m-linux_aarch64.whl
pip3 install tensorflow-1.8.0-cp36-cp36m-linux_aarch64.whl

Parameters:
TF_NEED_JEMALLOC=1
TF_NEED_CUDA=1
TF_CUDA_COMPUTE_CAPABILITIES=7.2,6.2,5.3
TF_NEED_TENSORRT=1
TF_NCCL_VERSION=1

Hi Naisy,

I tried installing Tensorlow in Jetson Xavier using

https://github.com/naisy/JetsonXavier/tree/JetPack4.0_python2.7

After an hour of installation, I got the following error.

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: “Xavier”
CUDA Driver Version / Runtime Version 10.0 / 10.0
CUDA Capability Major/Minor version number: 7.2
Total amount of global memory: 15827 MBytes (16596103168 bytes)
( 8) Multiprocessors, ( 64) CUDA Cores/MP: 512 CUDA Cores
GPU Max Clock rate: 1500 MHz (1.50 GHz)
Memory Clock rate: 1500 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 524288 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: Yes
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 0 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.0, CUDA Runtime Version = 10.0, NumDevs = 1
Result = PASS
arm64
./join_tensorflow_whl.sh: line 26: syntax error: unexpected end of file
./install_tensorflow.sh: line 27: syntax error: unexpected end of file

Hi ankitpurohit,

These files needs ‘fi’ for end of if.
I updated these files. Sorry.

When continuing manually,

cd JetsonXavier/JetPack4.0/python2.7/binay
cat tensorflow-1.10.1-cp27-cp27mu-linux_aarch64.whl.part1 tensorflow-1.10.1-cp27-cp27mu-linux_aarch64.whl.part2 > tensorflow-1.10.1-cp27-cp27mu-linux_aarch64.whl
sudo su
pip install tensorflow-1.10.1-cp27-cp27mu-linux_aarch64.whl

Thank you for doing this. I would love to know how you built this because I tried and failed horribly.
I tried using this information:
[url]https://github.com/JasonAtNvidia/JetsonTFBuild[/url]

But could never get the build to complete successfully.

On my Mac I’m running Python v3.6.5, Keras v2.2.0, and Tensorflow v1.8.0. I created a LeNet architecture and I’m training on the MNIST dataset. Although slow it will complete the training. I’ve never had it fail.

When I tried the same experiment on the Jetson, Python v3.6.5, Keras v2.2.0, Tensorflow v1.10.0, 50% of the time the training would fail. The error being - “Input to reshape is a tensor with xxx values, but the requested shape has xxx.” I’m using the exact same program from the Mac.

That is why I wanted to go back to Tensorflow v1.8.0 on the Jetson and repeat the experiment.

Hi Naisy,

Thank you so much for your hard work.
I got succeeded to build tensorflow .

This is the final line of installation.

Successfully installed absl-py-0.5.0 astor-0.7.1 backports.weakref-1.0.post1 gast-0.2.0 grpcio-1.15.0 markdown-3.0.1 numpy-1.14.5 protobuf-3.6.1 setuptools-39.1.0 tensorboard-1.10.0 tensorflow-1.10.1 termcolor-1.1.0 werkzeug-0.14.1

Thank you.

Hi Aerial Robotics Guru,

1. Requirements for building tensorflow:

  • numpy of pip package.
  • mock of pip package.
  • Java 8 is required for bazel. (Not required for TF execution)
  • bazel is required. (Not required for TF execution)

In addition, patches may be applied to the source code.
https://github.com/naisy/JetsonXavier/blob/JetPack4.0_python3.6/JetPack4.0/python3.6/scripts/build_tensorflow.sh

2. Training MNIST data using LeNet model in Keras:
It seemed that there was no problem as far as I tried.

  • Environment
# remove naisy build tensorflow
pip3 uninstall tensorflow
# install official tensorflow
pip3 install --extra-index-url https://developer.download.nvidia.com/compute/redist/jp40 tensorflow-gpu
# install keras-2.2.0
pip3 install --upgrade keras==2.2.0
  • Source code (mnist_lenet.py)
# https://github.com/keras-team/keras/blob/master/examples/mnist_cnn.py
'''Trains a simple convnet on the MNIST dataset.
Gets to 99.25% test accuracy after 12 epochs
(there is still a lot of margin for parameter tuning).
16 seconds per epoch on a GRID K520 GPU.
'''

from __future__ import print_function
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D, AveragePooling2D
from keras import backend as K

batch_size = 128
num_classes = 10
epochs = 12

# input image dimensions
img_rows, img_cols = 28, 28

def LeNet(input_shape, num_classes):
    model = Sequential()
    model.add(Conv2D(20, kernel_size=5, strides=1, activation='relu', input_shape=input_shape))
    model.add(MaxPooling2D(2, strides=2))

    model.add(Conv2D(50, kernel_size=5, strides=1, activation='relu'))
    model.add(MaxPooling2D(2, strides=2))
    model.add(Dropout(0.25))

    model.add(Flatten())
    model.add(Dense(500, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(num_classes, activation='softmax'))

    model.compile(loss=keras.losses.categorical_crossentropy,
                  optimizer=keras.optimizers.SGD(),
                  metrics=['accuracy'])
    return model

def default_cnn(input_shape, num_classes):
    model = Sequential()
    model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape))
    model.add(Conv2D(64, (3, 3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.25))
    model.add(Flatten())
    model.add(Dense(128, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(num_classes, activation='softmax'))

    model.compile(loss=keras.losses.categorical_crossentropy,
                  optimizer=keras.optimizers.Adadelta(),
                  metrics=['accuracy'])
    return model

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

if K.image_data_format() == 'channels_first':
    x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
    x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

#model = default_cnn(input_shape, num_classes)
model = LeNet(input_shape, num_classes)

model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          verbose=1,
          validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
  • Training
python mnist_lenet.py
  • Result
Using TensorFlow backend.
x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
Train on 60000 samples, validate on 10000 samples
Epoch 1/12
2018-10-03 05:34:14.234838: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:857] ARM64 does not support NUMA - returning NUMA node zero
2018-10-03 05:34:14.235162: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties: 
name: Xavier major: 7 minor: 2 memoryClockRate(GHz): 1.5
pciBusID: 0000:00:00.0
totalMemory: 15.46GiB freeMemory: 9.55GiB
2018-10-03 05:34:14.235332: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0
2018-10-03 05:34:15.064031: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-10-03 05:34:15.064223: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971]      0 
2018-10-03 05:34:15.064312: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0:   N 
2018-10-03 05:34:15.064639: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9066 MB memory) -> physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2)
60000/60000 [==============================] - 12s 196us/step - loss: 1.4815 - acc: 0.5106 - val_loss: 0.3588 - val_acc: 0.9084
Epoch 2/12
60000/60000 [==============================] - 6s 92us/step - loss: 0.4331 - acc: 0.8666 - val_loss: 0.2002 - val_acc: 0.9450
Epoch 3/12
60000/60000 [==============================] - 5s 91us/step - loss: 0.2958 - acc: 0.9104 - val_loss: 0.1520 - val_acc: 0.9561
Epoch 4/12
60000/60000 [==============================] - 5s 91us/step - loss: 0.2391 - acc: 0.9277 - val_loss: 0.1248 - val_acc: 0.9622
Epoch 5/12
60000/60000 [==============================] - 5s 90us/step - loss: 0.2048 - acc: 0.9381 - val_loss: 0.1072 - val_acc: 0.9676
Epoch 6/12
60000/60000 [==============================] - 5s 90us/step - loss: 0.1834 - acc: 0.9453 - val_loss: 0.0963 - val_acc: 0.9724
Epoch 7/12
60000/60000 [==============================] - 5s 89us/step - loss: 0.1656 - acc: 0.9501 - val_loss: 0.0864 - val_acc: 0.9737
Epoch 8/12
60000/60000 [==============================] - 5s 89us/step - loss: 0.1541 - acc: 0.9541 - val_loss: 0.0790 - val_acc: 0.9762
Epoch 9/12
60000/60000 [==============================] - 5s 89us/step - loss: 0.1416 - acc: 0.9572 - val_loss: 0.0738 - val_acc: 0.9776
Epoch 10/12
60000/60000 [==============================] - 5s 88us/step - loss: 0.1339 - acc: 0.9593 - val_loss: 0.0683 - val_acc: 0.9786
Epoch 11/12
60000/60000 [==============================] - 5s 88us/step - loss: 0.1255 - acc: 0.9612 - val_loss: 0.0648 - val_acc: 0.9797
Epoch 12/12
60000/60000 [==============================] - 5s 88us/step - loss: 0.1204 - acc: 0.9641 - val_loss: 0.0614 - val_acc: 0.9807
Test loss: 0.06142731437981129
Test accuracy: 0.9807
1 Like

Interesting. This is what I got the first time. Since then I have tried 2 other times with the same result.

Using TensorFlow backend.
Downloading data from https://s3.amazonaws.com/img-datasets/mnist.npz
11493376/11490434 [==============================] - 11s 1us/step
x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
Train on 60000 samples, validate on 10000 samples
Epoch 1/12
2018-10-03 12:14:29.103370: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:857] ARM64 does not support NUMA - returning NUMA node zero
2018-10-03 12:14:29.103925: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties: 
name: Xavier major: 7 minor: 2 memoryClockRate(GHz): 1.5
pciBusID: 0000:00:00.0
totalMemory: 15.45GiB freeMemory: 9.51GiB
2018-10-03 12:14:29.104045: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0
2018-10-03 12:14:30.841893: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-10-03 12:14:30.842226: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971]      0 
2018-10-03 12:14:30.842295: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0:   N 
2018-10-03 12:14:30.842868: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8951 MB memory) -> physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2)
60000/60000 [==============================] - 37s 613us/step - loss: 0.2631 - acc: 0.9188 - val_loss: 0.0581 - val_acc: 0.9804
Epoch 2/12
45824/60000 [=====================>........] - ETA: 4s - loss: 0.0915 - acc: 0.9737Traceback (most recent call last):
  File "mnist_cnn.py", line 66, in <module>
    validation_data=(x_test, y_test))
  File "/home/nvidia/.virtualenvs/dl4cv/lib/python3.6/site-packages/keras/engine/training.py", line 1042, in fit
    validation_steps=validation_steps)
  File "/home/nvidia/.virtualenvs/dl4cv/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 199, in fit_loop
    outs = f(ins_batch)
  File "/home/nvidia/.virtualenvs/dl4cv/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2661, in __call__
    return self._call(inputs)
  File "/home/nvidia/.virtualenvs/dl4cv/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2631, in _call
    fetched = self._callable_fn(*array_vals)
  File "/home/nvidia/.virtualenvs/dl4cv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1382, in __call__
    run_metadata_ptr)
  File "/home/nvidia/.virtualenvs/dl4cv/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 519, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 128 values, but the requested shape has 0
	 [[Node: training/Adadelta/gradients/loss/dense_2_loss/Sum_1_grad/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _class=["loc:@training/Adadelta/gradients/loss/dense_2_loss/Sum_1_grad/Tile"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](training/Adadelta/gradients/loss/dense_2_loss/Neg_grad/Neg, training/Adadelta/gradients/loss/dense_2_loss/Sum_1_grad/DynamicStitch/_81)]]

And another attempt.

Using TensorFlow backend.
x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
Train on 60000 samples, validate on 10000 samples
Epoch 1/12
2018-10-03 12:39:22.554559: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:857] ARM64 does not support NUMA - returning NUMA node zero
2018-10-03 12:39:22.554937: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties: 
name: Xavier major: 7 minor: 2 memoryClockRate(GHz): 1.5
pciBusID: 0000:00:00.0
totalMemory: 15.45GiB freeMemory: 8.54GiB
2018-10-03 12:39:22.555035: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0
2018-10-03 12:39:24.011406: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-10-03 12:39:24.011757: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971]      0 
2018-10-03 12:39:24.011977: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0:   N 
2018-10-03 12:39:24.012565: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8082 MB memory) -> physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2)
60000/60000 [==============================] - 26s 433us/step - loss: 1.6841 - acc: 0.4387 - val_loss: 0.4450 - val_acc: 0.8889
Epoch 2/12
47360/60000 [======================>.......] - ETA: 3s - loss: 0.5245 - acc: 0.8352Traceback (most recent call last):
  File "mnist_cnn.py", line 90, in <module>
    validation_data=(x_test, y_test))
  File "/home/nvidia/.virtualenvs/dl4cv/lib/python3.6/site-packages/keras/engine/training.py", line 1042, in fit
    validation_steps=validation_steps)
  File "/home/nvidia/.virtualenvs/dl4cv/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 199, in fit_loop
    outs = f(ins_batch)
  File "/home/nvidia/.virtualenvs/dl4cv/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2661, in __call__
    return self._call(inputs)
  File "/home/nvidia/.virtualenvs/dl4cv/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2631, in _call
    fetched = self._callable_fn(*array_vals)
  File "/home/nvidia/.virtualenvs/dl4cv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1382, in __call__
    run_metadata_ptr)
  File "/home/nvidia/.virtualenvs/dl4cv/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 519, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 128 values, but the requested shape has 0
	 [[Node: training/SGD/gradients/loss/dense_2_loss/Sum_grad/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _class=["loc:@training/SGD/gradients/loss/dense_2_loss/Sum_grad/Tile"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](training/SGD/gradients/loss/dense_2_loss/truediv_grad/Sum_1, training/SGD/gradients/loss/dense_2_loss/Sum_grad/DynamicStitch/_63)]]

Hi AerialRoboticsGuru,

I tried it with JetPack 4.1 DP which was released today, but there was no problem.
Because I do not use vertualenv, is that the difference?

Hi naisy,

I’m still on the original JetPack4.0. Initially I thought the virtualenv may be causing the problem. I ran a few tests yesterday. I deleted my original virtual environment and created a fresh one. I ran the mnist_lenet.py program both within the virtual environment and outside.

without virtual environment
pass - 8
fail - 2

with virtual environment
pass - 9
fail -1

My conclusion is that they are the same. I could repeat and try again with Tensorflow 1.8. However at this time if I can get 80-90% success then I am okay with it. As I mentioned before on my Mac I’ve never see a training failure. There may still be an issue with Tensorflow running on the ARM architecture. I appreciate your efforts.

Hi naisy,

Did you remove the the link for v1.8?

https://github.com/naisy/JetsonXavier/tree/JetPack4.0_python3.6/JetPack4.0/python3.6/binary

Hi AerialRoboticsGuru,

Sorry, it moved to JetPack4.1 now.
https://github.com/naisy/JetsonXavier/tree/JetPack4.1_python3.6/JetPack4.1/python3.6/binary
It is the same binary as JetPack4.0.

I will rebuild the repository, but I think that it can be traced from this URL.

Hi naisy,

Thanks for the TensorFlow build. Over the weekend I upgraded to JetPack 4.1 and have TensorFlow v1.8.0 installed. Unfortunately it doesn’t seem to have addressed my original error message. Approximately 1/10 times running my model I will get TensorFlow to fail. At this point I’m just moving on.

Thanks,
Andrew

Thanks @naisy,

I was able to install the tf binary at https://developer.download.nvidia.com/compute/redist/jp/v44/tensorflow/
using the guide provided for Jetpack 4.4 DP Installing TensorFlow for Jetson Platform :: NVIDIA Deep Learning Frameworks Documentation

I create a virtual environment for tensorflow 1.15 as well as 2.1 and ran the mnist training test above.

A.) Tensorflow 1.15.2+nv20.4 and Keras 2.2.4

name: Xavier major: 7 minor: 2 memoryClockRate(GHz): 1.377
pciBusID: 0000:00:00.0
2020-05-28 08:30:34.817542: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-05-28 08:30:34.829256: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-05-28 08:30:34.840522: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-05-28 08:30:34.844160: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-05-28 08:30:34.855435: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-05-28 08:30:34.862722: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-05-28 08:30:34.863544: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-05-28 08:30:34.863877: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2020-05-28 08:30:34.864209: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2020-05-28 08:30:34.864295: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0
2020-05-28 08:30:34.864432: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-05-28 08:30:36.604711: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-05-28 08:30:36.604872: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186]      0
2020-05-28 08:30:36.604987: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0:   N
2020-05-28 08:30:36.605639: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2020-05-28 08:30:36.605954: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2020-05-28 08:30:36.606213: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23419 MB memory) -> physical GP                                       U (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2)
WARNING:tensorflow:From /home/nv/.virtualenvs/tf1/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:190: The name tf.global_variables is deprecated. Please use tf.compat.v1.global                                       _variables instead.
WARNING:tensorflow:From /home/nv/.virtualenvs/tf1/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:199: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1                                       .is_variable_initialized instead.
WARNING:tensorflow:From /home/nv/.virtualenvs/tf1/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:206: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.v                                       ariables_initializer instead.
2020-05-28 08:30:46.768505: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-05-28 08:30:48.003721: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
60000/60000 [==============================] - 30s 495us/step - loss: 1.4332 - acc: 0.5256 - val_loss: 0.3754 - val_acc: 0.8997
Epoch 2/12
60000/60000 [==============================] - 9s 153us/step - loss: 0.4515 - acc: 0.8605 - val_loss: 0.2204 - val_acc: 0.9361
Epoch 3/12
60000/60000 [==============================] - 9s 153us/step - loss: 0.3161 - acc: 0.9019 - val_loss: 0.1652 - val_acc: 0.9497
Epoch 4/12
60000/60000 [==============================] - 9s 151us/step - loss: 0.2524 - acc: 0.9224 - val_loss: 0.1372 - val_acc: 0.9579
Epoch 5/12
60000/60000 [==============================] - 9s 150us/step - loss: 0.2158 - acc: 0.9340 - val_loss: 0.1164 - val_acc: 0.9631
Epoch 6/12
60000/60000 [==============================] - 9s 149us/step - loss: 0.1895 - acc: 0.9415 - val_loss: 0.1018 - val_acc: 0.9681
Epoch 7/12
60000/60000 [==============================] - 9s 149us/step - loss: 0.1707 - acc: 0.9477 - val_loss: 0.0904 - val_acc: 0.9719
Epoch 8/12
60000/60000 [==============================] - 9s 149us/step - loss: 0.1586 - acc: 0.9513 - val_loss: 0.0859 - val_acc: 0.9730
Epoch 9/12
60000/60000 [==============================] - 9s 149us/step - loss: 0.1441 - acc: 0.9562 - val_loss: 0.0763 - val_acc: 0.9752
Epoch 10/12
60000/60000 [==============================] - 9s 148us/step - loss: 0.1380 - acc: 0.9570 - val_loss: 0.0708 - val_acc: 0.9781
Epoch 11/12
60000/60000 [==============================] - 9s 150us/step - loss: 0.1271 - acc: 0.9614 - val_loss: 0.0663 - val_acc: 0.9787
Epoch 12/12
60000/60000 [==============================] - 9s 149us/step - loss: 0.1252 - acc: 0.9611 - val_loss: 0.0636 - val_acc: 0.9803
Test loss: 0.06360115777691826
Test accuracy: 0.9803

B.) Tensorflow 2.1.0+nv20.4 and Keras 2.2.4

pciBusID: 0000:00:00.0 name: Xavier computeCapability: 7.2
coreClock: 1.377GHz coreCount: 8 deviceMemorySize: 31.17GiB deviceMemoryBandwidth: 82.08GiB/s
2020-05-28 08:34:44.917703: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-05-28 08:34:44.917834: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-05-28 08:34:44.920948: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-05-28 08:34:44.921789: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-05-28 08:34:44.925813: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-05-28 08:34:44.928751: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-05-28 08:34:44.928902: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-05-28 08:34:44.929129: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-05-28 08:34:44.929415: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-05-28 08:34:44.929522: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-05-28 08:34:44.929653: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-05-28 08:34:48.372985: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-05-28 08:34:48.373151: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0
2020-05-28 08:34:48.373230: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N
2020-05-28 08:34:48.374116: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-05-28 08:34:48.374413: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-05-28 08:34:48.374735: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 20798 MB memory) -> physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2)
WARNING:tensorflow:From /home/nv/.virtualenvs/tf2/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:190: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

WARNING:tensorflow:From /home/nv/.virtualenvs/tf2/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:199: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead.

WARNING:tensorflow:From /home/nv/.virtualenvs/tf2/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:206: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.

2020-05-28 08:34:55.436745: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-05-28 08:34:56.412336: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
60000/60000 [==============================] - 23s 389us/step - loss: 1.4081 - acc: 0.5403 - val_loss: 0.3864 - val_acc: 0.8945
Epoch 2/12
60000/60000 [==============================] - 7s 122us/step - loss: 0.4643 - acc: 0.8542 - val_loss: 0.2252 - val_acc: 0.9362
Epoch 3/12
60000/60000 [==============================] - 8s 132us/step - loss: 0.3197 - acc: 0.9018 - val_loss: 0.1689 - val_acc: 0.9505
Epoch 4/12
60000/60000 [==============================] - 9s 146us/step - loss: 0.2579 - acc: 0.9205 - val_loss: 0.1375 - val_acc: 0.9583
Epoch 5/12
60000/60000 [==============================] - 9s 146us/step - loss: 0.2193 - acc: 0.9325 - val_loss: 0.1208 - val_acc: 0.9632
Epoch 6/12
60000/60000 [==============================] - 9s 146us/step - loss: 0.1957 - acc: 0.9401 - val_loss: 0.1067 - val_acc: 0.9676
Epoch 7/12
60000/60000 [==============================] - 9s 146us/step - loss: 0.1744 - acc: 0.9469 - val_loss: 0.0939 - val_acc: 0.9709
Epoch 8/12
60000/60000 [==============================] - 9s 147us/step - loss: 0.1648 - acc: 0.9504 - val_loss: 0.0867 - val_acc: 0.9740
Epoch 9/12
60000/60000 [==============================] - 9s 147us/step - loss: 0.1517 - acc: 0.9540 - val_loss: 0.0810 - val_acc: 0.9756
Epoch 10/12
60000/60000 [==============================] - 9s 146us/step - loss: 0.1408 - acc: 0.9562 - val_loss: 0.0748 - val_acc: 0.9768
Epoch 11/12
60000/60000 [==============================] - 9s 145us/step - loss: 0.1328 - acc: 0.9594 - val_loss: 0.0696 - val_acc: 0.9779
Epoch 12/12
60000/60000 [==============================] - 9s 146us/step - loss: 0.1270 - acc: 0.9610 - val_loss: 0.0660 - val_acc: 0.9795
Test loss: 0.06595891129858791
Test accuracy: 0.9795