Is my Tensorflow install really uses the GPU?

jbuusao · April 20, 2020, 10:54am

Greetings,

On my Jetson Nano board (4GB RAM, 8GB swap file), I installed Tensorflow (version 2.1.0+nv20.3.tf2) on top of JetPack 4.3, and ensured that the GPU was detected:

from tensorflow.python.client import device_lib

def get_available_gpus():
local_device_protos = device_lib.list_local_devices()
return [x.name for x in local_device_protos if x.device_type == ‘GPU’]

get_available_gpus()

My output was:

tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
tensorflow/compiler/xla/service/service.cc:168] XLA service 0x29536a90 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA Tegra X1, Compute Capability 5.3
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: NVIDIA Tegra X1 computeCapability: 5.3 coreClock: 0.9216GHz coreCount: 1 deviceMemorySize: 3.86GiB deviceMemoryBandwidth: 23.84GiB/s
tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] 0
tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0: N
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/device:GPU:0 with 270 MB memory)
→ physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)

Now, when testing MNIST under Jupyter Notebook:

import tensorflow as tf
mnist = tf.keras.datasets.mnist

(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)), tf.keras.layers.Dense(128, activation=‘relu’),
tf.keras.layers.Dropout(0.2), tf.keras.layers.Dense(10, activation=‘softmax’)])

model.compile(optimizer=‘adam’, loss=‘sparse_categorical_crossentropy’, metrics=[‘accuracy’])

model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test)

The performance of the fitting was quite poor, I thought:

Train on 60000 samples

Epoch 1/5

60000/60000 [==============================] - 35s 586us/sample - loss: 0.3002 - accuracy: 0.9122

Epoch 2/5

60000/60000 [==============================] - 26s 437us/sample - loss: 0.1457 - accuracy: 0.9567

Epoch 3/5

60000/60000 [==============================] - 26s 425us/sample - loss: 0.1053 - accuracy: 0.9676

Epoch 4/5

60000/60000 [==============================] - 26s 434us/sample - loss: 0.0866 - accuracy: 0.9734

Epoch 5/5

60000/60000 [==============================] - 26s 429us/sample - loss: 0.0724 - accuracy: 0.9773

On my laptop (without GPU) I’ve got much better performance:

Train on 60000 samples

Epoch 1/5

60000/60000 [==============================] - 3s 57us/sample - loss: 0.2959 - accuracy: 0.9119

Epoch 2/5

60000/60000 [==============================] - 3s 56us/sample - loss: 0.1453 - accuracy: 0.9564

Epoch 3/5

60000/60000 [==============================] - 4s 63us/sample - loss: 0.1083 - accuracy: 0.9671

Epoch 4/5

60000/60000 [==============================] - 3s 54us/sample - loss: 0.0873 - accuracy: 0.9736

Epoch 5/5

60000/60000 [==============================] - 3s 48us/sample - loss: 0.0757 - accuracy: 0.9760

Surely I don’t have the right settings. What are the best performances that I could expect on Jetson Nano, with an optimal configuration?

Thank you in advance for advising
JP

dkreutz · April 20, 2020, 12:47pm

In order to check if Keras is using GPU according to this you can try:

from keras import backend as K
K.tensorflow_backend._get_available_gpus()

Jetson Nano is designed as a edge computing device for GPU-assisted inference. Bad training performance compared to a current desktop CPU is not surprising.

Topic		Replies	Views
Performance improvement on Jetson Nano Jetson Nano tensorflow	6	1555	October 18, 2021
GPU support for tflite Jetson Nano cuda , tensorflow	8	5256	October 18, 2021
Tensorflow error in NVIDIA TX1 Jetson TX1	7	1881	December 30, 2017
Tensorflow not using GPU in Jetson TX2 Jetson TX2	12	4290	October 18, 2021
TensorFlow performance Jetson Nano	2	3107	October 18, 2021
Nano with jetpack 4.3 can't find gpu with tensorflow 2.1 Jetson Nano tensorflow	9	1451	October 18, 2021
Jetson Xavier NX - Tensorflow 2 container slower on GPU than on CPU Jetson Xavier NX tensorflow	5	2545	October 18, 2021
ARM64 does not support NUMA - returning NUMA node zero Jetson AGX Xavier tensorflow	6	1452	July 14, 2022
run tensorflow 1.3 on tx2 stuck Jetson TX2	20	5574	October 18, 2021
Tf-trt Jetson Nano - process killed - conversion running out of memory? Jetson Nano tensorrt , tensorflow	5	1315	October 18, 2021

Is my Tensorflow install really uses the GPU?

Related topics