Greetings,
On my Jetson Nano board (4GB RAM, 8GB swap file), I installed Tensorflow (version 2.1.0+nv20.3.tf2) on top of JetPack 4.3, and ensured that the GPU was detected:
from tensorflow.python.client import device_lib
def get_available_gpus():
local_device_protos = device_lib.list_local_devices()
return [x.name for x in local_device_protos if x.device_type == ‘GPU’]get_available_gpus()
My output was:
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
tensorflow/compiler/xla/service/service.cc:168] XLA service 0x29536a90 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA Tegra X1, Compute Capability 5.3
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: NVIDIA Tegra X1 computeCapability: 5.3 coreClock: 0.9216GHz coreCount: 1 deviceMemorySize: 3.86GiB deviceMemoryBandwidth: 23.84GiB/s
tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] 0
tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0: N
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/device:GPU:0 with 270 MB memory)
→ physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
Now, when testing MNIST under Jupyter Notebook:
import tensorflow as tf
mnist = tf.keras.datasets.mnist(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)), tf.keras.layers.Dense(128, activation=‘relu’),
tf.keras.layers.Dropout(0.2), tf.keras.layers.Dense(10, activation=‘softmax’)])model.compile(optimizer=‘adam’, loss=‘sparse_categorical_crossentropy’, metrics=[‘accuracy’])
model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test)
The performance of the fitting was quite poor, I thought:
Train on 60000 samples
Epoch 1/5
60000/60000 [==============================] - 35s 586us/sample - loss: 0.3002 - accuracy: 0.9122
Epoch 2/5
60000/60000 [==============================] - 26s 437us/sample - loss: 0.1457 - accuracy: 0.9567
Epoch 3/5
60000/60000 [==============================] - 26s 425us/sample - loss: 0.1053 - accuracy: 0.9676
Epoch 4/5
60000/60000 [==============================] - 26s 434us/sample - loss: 0.0866 - accuracy: 0.9734
Epoch 5/5
60000/60000 [==============================] - 26s 429us/sample - loss: 0.0724 - accuracy: 0.9773
On my laptop (without GPU) I’ve got much better performance:
Train on 60000 samples
Epoch 1/5
60000/60000 [==============================] - 3s 57us/sample - loss: 0.2959 - accuracy: 0.9119
Epoch 2/5
60000/60000 [==============================] - 3s 56us/sample - loss: 0.1453 - accuracy: 0.9564
Epoch 3/5
60000/60000 [==============================] - 4s 63us/sample - loss: 0.1083 - accuracy: 0.9671
Epoch 4/5
60000/60000 [==============================] - 3s 54us/sample - loss: 0.0873 - accuracy: 0.9736
Epoch 5/5
60000/60000 [==============================] - 3s 48us/sample - loss: 0.0757 - accuracy: 0.9760
Surely I don’t have the right settings. What are the best performances that I could expect on Jetson Nano, with an optimal configuration?
Thank you in advance for advising
JP