Slow convolution speeds on TK1

psdeering · February 1, 2018, 8:47am

Jetson TK1 (L4T, CUDA 6.5, cuDNN 2.0)

Convolutions are slow on this hardware. Based on a 256x256x3 input image (format NHWC), convolved against a 3x3x64 filter (format KCHW), output tensor 256x256x64 (format NCHW), applied with [cudnnConvolutionForward()]. I am applying biases to the convolution output [cudnnAddTensor()], and performing RELU on the final output [cudnnActivationForward()].

This convolution is taking 10.73ms.

Is this the expected performance for such a small convolution, or are there optimisation tricks I am missing?

psdeering · February 1, 2018, 8:50am

I’m assuming that having external memory outside the Tegra K1 GPU is a serious bottleneck in these kinds of operations.

The question is, how much of a significant performance boost would I get executing this same operation on a Tegra X1 or X2 (on Jetson boards)? I’m assuming the FP16 on the TX1/TX2 and onboard GPU memory is going to significantly improve performance on these operations.

kayccc · February 2, 2018, 3:17am

Hi psdeering,

I can’t tell the exactly performance improvement you can get as I didn’t try it on all platforms, but as our supporting at the other topic you posted, it’s recommended to use TX1 or TX2 for deep learning use case.

NHWC is supported in our newer cuDNN version but is not available for TK1.
[url]Developer Guide :: NVIDIA Deep Learning cuDNN Documentation

Thanks

Topic		Replies	Views
Tegra K1 vs Tegra X1 vs Tegra X2 cuDNN convolution speeds CUDA Programming and Performance	0	1016	January 15, 2018
"Slow" OpenCV performance on TK1 (Shield Tablet / Android) GPU-Accelerated Libraries opencv	0	1326	September 1, 2014
Cudnn convolution is significantly slow cuDNN	3	1260	April 19, 2022
Performance comparision TK1 vs TX1 Jetson TX1	5	4097	December 2, 2016
CUDNN: cudnnConvolutionForward very bad performance(very long execution time) on xavier Jetson AGX Xavier	3	1169	April 30, 2019
TX1 slower than TK1 Jetson TX1	5	1401	August 19, 2016
cuDNN with Caffe on the Jetson TK1 GPU-Accelerated Libraries	0	772	March 6, 2016
cudnnPoolingForward() tensor format support Jetson TK1	1	596	January 30, 2018
Opencv4Tegra GPU vs CPU TK1 vs TX1 Jetson TX1 opencv	3	3759	April 28, 2016
Deep Learning Inference: Performance validation on TX1 Jetson TX1	15	15386	July 16, 2018

Slow convolution speeds on TK1

Related topics