TensorFlow performance

clarenceho · March 28, 2019, 4:42am

Anyone has experience with TensorFlow on Nano? How is the performance? I just tried a simple script and it runs nearly twice as fast on CPU when compared with running it on CUDA.

I followed the steps to install TensorFlow: https://docs.nvidia.com/deeplearning/dgx/install-tf-xavier/index.html (similar to the steps mentioned in another topic here)

Then I tried to run this sample, which is modified from from the book “TensorFlow for Deep Learning Companion Code”
https://github.com/kitsook/dlwithtf/blob/fix-lost-function/ch3/linear_regression_tf.py

Switch Ubuntu to runlevel 3 to free up some memory before the tests.

sudo systemctl isolate multi-user.target

Running the code normally on CUDA, the training part took: 43s:

(linear_regression_tf.py:8106): Gdk-CRITICAL **: 21:21:38.160: gdk_cursor_new_for_display: assertion 'GDK_IS_DISPLAY (display)' failed

(linear_regression_tf.py:8106): Gdk-CRITICAL **: 21:21:38.164: gdk_cursor_new_for_display: assertion 'GDK_IS_DISPLAY (display)' failed
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2019-03-27 21:21:41.201561: W tensorflow/core/platform/profile_utils/cpu_utils.cc:98] Failed to find bogomips in /proc/cpuinfo; cannot determine CPU frequency
2019-03-27 21:21:41.202159: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x3f3f6740 executing computations on platform Host. Devices:
2019-03-27 21:21:41.202218: I tensorflow/compiler/xla/service/service.cc:168]   StreamExecutor device (0): <undefined>, <undefined>
2019-03-27 21:21:41.289107: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:965] ARM64 does not support NUMA - returning NUMA node zero
2019-03-27 21:21:41.289392: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x3d4e3d50 executing computations on platform CUDA. Devices:
2019-03-27 21:21:41.289441: I tensorflow/compiler/xla/service/service.cc:168]   StreamExecutor device (0): NVIDIA Tegra X1, Compute Capability 5.3
2019-03-27 21:21:41.289806: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: NVIDIA Tegra X1 major: 5 minor: 3 memoryClockRate(GHz): 0.9216
pciBusID: 0000:00:00.0
totalMemory: 3.86GiB freeMemory: 1.94GiB
2019-03-27 21:21:41.289863: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-03-27 21:21:42.374481: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-03-27 21:21:42.374562: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-03-27 21:21:42.374592: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-03-27 21:21:42.374771: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1518 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
2019-03-27 21:21:43.782819: I tensorflow/stream_executor/dso_loader.cc:153] successfully opened CUDA library libcublas.so.10.0 locally
Training reulst: W=4.765114, b=2.123778
Time taken for learning: 43.052430391311646
Pearson R^2: 0.994371
RMS: 0.119845

Running the same script on CPU by specifying the environment variable to hide CUDA:

CUDA_VISIBLE_DEVICES="" python3 linear_regression_tf.py

It took 25s, nearly half the time, to complete the training.

(linear_regression_tf.py:8295): Gdk-CRITICAL **: 21:22:35.522: gdk_cursor_new_for_display: assertion 'GDK_IS_DISPLAY (display)' failed

(linear_regression_tf.py:8295): Gdk-CRITICAL **: 21:22:35.526: gdk_cursor_new_for_display: assertion 'GDK_IS_DISPLAY (display)' failed
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2019-03-27 21:22:38.583032: W tensorflow/core/platform/profile_utils/cpu_utils.cc:98] Failed to find bogomips in /proc/cpuinfo; cannot determine CPU frequency
2019-03-27 21:22:38.584033: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x9774bd0 executing computations on platform Host. Devices:
2019-03-27 21:22:38.584096: I tensorflow/compiler/xla/service/service.cc:168]   StreamExecutor device (0): <undefined>, <undefined>
2019-03-27 21:22:38.613701: E tensorflow/stream_executor/cuda/cuda_driver.cc:300] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2019-03-27 21:22:38.613800: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:148] kernel driver does not appear to be running on this host (clarence-jetson-nano): /proc/driver/nvidia/version does not exist
Training reulst: W=4.765114, b=2.123778
Time taken for learning: 25.613048553466797
Pearson R^2: 0.994371
RMS: 0.119845

Maybe the platform is not meant to be used for training?

AastaLLL · March 28, 2019, 8:30am

Hi,

Jetson platform is designed for fast inference so it’s not recommended to use for training.
If you are looking for an AI benchmark for Nano, please check this blog:
[url]https://devblogs.nvidia.com/jetson-nano-ai-computing/[/url]

Thanks.

Topic		Replies	Views
Nano with jetpack 4.3 can't find gpu with tensorflow 2.1 Jetson Nano tensorflow	9	1445	October 18, 2021
Is my Tensorflow install really uses the GPU? Jetson Nano cuda , tensorflow	2	725	October 18, 2021
TensorFlow 2.0? Jetson Nano	22	6425	October 14, 2021
TensorFlow wheel for JetPack 4.0 !! Jetson AGX Xavier	16	3673	October 15, 2018
Performance improvement on Jetson Nano Jetson Nano tensorflow	6	1542	October 18, 2021
TensorFlow GPU not Working Jetson Nano tensorflow	5	2314	October 10, 2021
tensorflow.python.framework.errors_impl.InternalError: GPU sync failed Jetson TX2	8	6276	October 18, 2021
Tensorflow 2.1 with CUDA10.2 warnings .. Frameworks tensorflow	15	17742	July 3, 2020
Segmentation fault at training network Jetson TX2 ai-training	6	2597	September 5, 2021
Jetson Xavier NX - Tensorflow 2 container slower on GPU than on CPU Jetson Xavier NX tensorflow	5	2532	October 18, 2021

TensorFlow performance

Related topics