I’m just getting up and running on the AGX platform and want to make sure things are setup correct and the performance I’m getting from my system is what one would expect. Can someone please run the following mnist test code and compare your results with mine? I’m using mode 0 (sudo nvpmodel -m 0) and setting max clocks (sudo jetson_clocks.sh)
import tensorflow as tf
mnist = tf.keras.datasets.mnist
(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(512, activation=tf.nn.relu),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test)
Result on AGX:
$ python3 digits.py
Epoch 1/5
2018-12-26 04:23:04.691017: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:924] ARM64 does not support NUMA - returning NUMA node zero
2018-12-26 04:23:04.691301: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: Xavier major: 7 minor: 2 memoryClockRate(GHz): 1.5
pciBusID: 0000:00:00.0
totalMemory: 15.46GiB freeMemory: 10.53GiB
2018-12-26 04:23:04.691758: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2018-12-26 04:23:05.355237: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-12-26 04:23:05.355360: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2018-12-26 04:23:05.355441: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2018-12-26 04:23:05.355654: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10010 MB memory) -> physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2)
60000/60000 [==============================] - 14s 236us/step - loss: 0.2027 - acc: 0.9399
Epoch 2/5
60000/60000 [==============================] - 11s 177us/step - loss: 0.0814 - acc: 0.9749
Epoch 3/5
60000/60000 [==============================] - 11s 176us/step - loss: 0.0514 - acc: 0.9836
Epoch 4/5
60000/60000 [==============================] - 11s 175us/step - loss: 0.0368 - acc: 0.9881
Epoch 5/5
60000/60000 [==============================] - 10s 174us/step - loss: 0.0272 - acc: 0.9913
10000/10000 [==============================] - 1s 93us/step
I ran the same code in a tensorflow docker container on my host 2018 15" macbook pro laptop, and getting better results than this. The docker engine is configured to use only 4 cores (2.2GHz i7) and 8GB of memory.
Result on host (4 cores i7 @2.2GHz and 4GB memory):
Epoch 1/5
60000/60000 [==============================] - 9s 157us/step - loss: 0.2018 - acc: 0.9404
Epoch 2/5
60000/60000 [==============================] - 9s 151us/step - loss: 0.0797 - acc: 0.9754
Epoch 3/5
60000/60000 [==============================] - 9s 151us/step - loss: 0.0535 - acc: 0.9832
Epoch 4/5
60000/60000 [==============================] - 9s 149us/step - loss: 0.0383 - acc: 0.9878
Epoch 5/5
60000/60000 [==============================] - 9s 151us/step - loss: 0.0275 - acc: 0.9911
10000/10000 [==============================] - 0s 40us/step