Cudnn forward conv 5x5 benchmark, 14ms (Nano), 0.3ms(2080ti)

I saw a bunch of posts asking about the low framerate on the Jetson Nano for some nn models.

I grabbed the gist from Convolutions with cuDNN – Peter Goldsborough, made some changes and ran on both the Nano and 2080ti on a 578 by 549 size image and averaged over 1000 iterations (forward only).

Thought this might help some explain the fps of some of their models.

Took Nano 14,302 ms to do 1000 iters
Took 2080TI 301ms to do 1000 iters

My repo with changes is here: https://github.com/fabricatedmath/cudnn-forward-conv-bench

Hi,

Your results looks correct.

I also test your use case with our cuDNN sample and get the similar results:
/usr/src/cudnn_samples_v7/conv_sample

nvidia@nano:/usr/src/cudnn_samples_v7/conv_sample$ ./conv_sample 
Using format CUDNN_TENSOR_NCHW (for INT8x4 and INT8x32 tests use CUDNN_TENSOR_NCHW_VECT_C)
Testing single precision
====USER DIMENSIONS====
input dims are 1, 3, 578, 549
filter dims are 3, 3, 5, 5
output dims are 1, 3, 584, 555
====PADDING DIMENSIONS====
padded input dims are 1, 3, 578, 549
padded filter dims are 3, 3, 5, 5
padded output dims are 1, 3, 584, 555
Testing conv
^^^^ CUDA : <b>elapsed = 0.0143349 sec, </b> 
Test PASSED
Testing half precision (math in single precision)
====USER DIMENSIONS====
input dims are 1, 3, 578, 549
filter dims are 3, 3, 5, 5
output dims are 1, 3, 584, 555
====PADDING DIMENSIONS====
padded input dims are 1, 3, 578, 549
padded filter dims are 3, 3, 5, 5
padded output dims are 1, 3, 584, 555
Testing conv
^^^^ CUDA : <b>elapsed = 0.0144742 sec</b>,  
Test PASSED

Thanks.

Thanks for verifying, also didn’t know there were official cudnn samples!