Surprised at how slow Xavier is on training small regression model compared to x86 with no GPU Maybe something wrong?

sidener2002 · January 3, 2019, 8:48pm

Hi, I just got my Xavier up and running and wanted to test it with a very simple and small regression learning problem using Keras and Tensorflow. This is just training on 264 data points with a 4 k-fold validation. Only 3 64-node relu layers and an output layer.

I got the Xavier because I thought the GPU’s would be much faster than a simple x86 processor with no GPUs. However, the modeling time on the x86 for this problem is 8.06s whereas the Xavier is taking 32.17s for the exact same operation. Does this sound odd? I’m running the Xavier in npvmodel -m 0 mode.

In addition, Tensorflow is generating some additional console output on the Xavier that I have not seen before, maybe this is a clue as to what is happening.

Any ideas or similar experience from anyone?

Thanks!

Scott

output:
nvidia@jetson-0423018054460:~/Desktop/TestTensorflow$ python3 Test_Tensorflow.py
Using TensorFlow backend.
processing k-fold # 0
2019-01-03 20:36:25.755075: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:924] ARM64 does not support NUMA - returning NUMA node zero
2019-01-03 20:36:25.755304: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: Xavier major: 7 minor: 2 memoryClockRate(GHz): 1.5
pciBusID: 0000:00:00.0
totalMemory: 15.45GiB freeMemory: 10.41GiB
2019-01-03 20:36:25.755360: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-01-03 20:36:26.431348: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-01-03 20:36:26.431494: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2019-01-03 20:36:26.431539: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2019-01-03 20:36:26.431793: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9895 MB memory) → physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2)
processing k-fold # 1
processing k-fold # 2
processing k-fold # 3
modeling duration = 32.177246 s

dusty_nv · January 4, 2019, 12:33am

Hi sidener2002, I’m not familiar with this code in particular, so maybe another poster has input on the TensorFlow side, however here are some general observations:

The data size of 264 points is small. GPUs excel at processing large datasets, and DNNs (i.e. convolutional, RNN, ect.)
Have you tested this code on another discrete GPU to establish a baseline for it's GPU performance?
Jetson in particular is optimized for inferencing, not training. Typically discrete GPU(s) are used for training
For some example code of using Jetson for inferencing with TensorFlow and NVIDIA TensorRT, see here:
https://github.com/NVIDIA-AI-IOT/tf_trt_models
https://github.com/NVIDIA-AI-IOT/tf_to_trt_image_classification

Topic		Replies	Views
Are these timings normal for Jetson Xavier AGX? Jetson AGX Xavier	7	538	November 4, 2019
Keras very slow on Xavier's GPU Jetson AGX Xavier	7	1687	October 18, 2021
Jetson Xavier NX - Tensorflow 2 container slower on GPU than on CPU Jetson Xavier NX tensorflow	5	2581	October 18, 2021
performance of AGX Jetson AGX Xavier	7	1101	December 26, 2018
Slow model loading on a Jetson AGX Xavier with TensorFlow 2.5.0 Jetson AGX Xavier cuda , tensorflow	13	2367	November 10, 2021
ARM64 does not support NUMA - returning NUMA node zero Jetson AGX Xavier tensorflow	6	1524	July 14, 2022
Xavier is lower performance than TX2 Jetson AGX Xavier	2	418	October 18, 2021
Tensorflow not using GPU in Jetson TX2 Jetson TX2	12	4321	October 18, 2021
TensorFlow getting different results on Jetson Jetson AGX Xavier tensorflow , nvbugs	9	1113	November 27, 2020
Quadro M2000M outperform Xavier in TF Jetson AGX Xavier	3	525	July 2, 2019

Surprised at how slow Xavier is on training small regression model compared to x86 with no GPU Maybe something wrong?

Related topics