Fully connected layer using cuDNN library

vsm2 · November 5, 2018, 1:46pm

Hi there,

I’m building a neural network with 2 convolution layers and 2 fully connected layer for one of my application. I would like to use cuDNN library API calls to perform these operations. I was able to find cuDNN APIs for convolution layers but I could not get any for fully connected layers. Is there any such APIs? Or should I manually convert outputs of conv layer to right format and feed them to SGEMM?

Thanks in advance.

KingDudman · November 10, 2018, 8:06pm

You don’t need to use another library. Although the performance might be better with another library. Just set up the convolution with the weights being the same size as the input parameters. So, if you had a 4d NCHW tensor of dims of [4,1,28,28]. Then you would set the 2d convolution to have a slide of [1,1] dilation of [1,1] and a padding of [0,0]. The filter dims will be [x,1,28,28]. The output of the convolution will be [4,x,1,1]. The next convolution will have the same settings. This time though your filter will be [y,x,1,1]. Then after that same convolution settings. Your filter will be [z,y,1,1]. So on and so forth.

Recap.

Convolution 2D settings will always be: slide [1,1], padding [0,0], dilation [1,1].

PL == previous layer

Filter NCHW dims will be: [ (# of channels), (# of channels PL), (H of PL), (W of PL)].

system · November 13, 2019, 4:53am

Hi,

In case someone comes around having the same question as I did.

On my test hardware (RTX2070), using cudnn convolutions instead of cublas gemm is approx. 50% slower for a fully connected layer.

This chapter talk a bit about using gemm for fully connected layers:

https://docs.nvidia.com/deeplearning/sdk/dl-performance-guide/index.html#fullyconnected-layer

I’ve implemented it here https://github.com/andoma/saga/blob/master/src/fc.cpp (fp32 and fp16 variant)

guidocalvano · February 4, 2023, 2:20pm

To anyone still interested in using a cublas gemm operation, andoma moved his implementation here: saga/cuda_dnn.cpp at 88bd177626f723bb2a0d065f166fa09c9130f9b4 · andoma/saga · GitHub

Topic		Replies	Views
cudnn and fullconnect CUDA Programming and Performance	0	339	December 17, 2017
Why is 2-D convolution slower than the matrix product? CUDA Programming and Performance	17	7067	April 18, 2015
How to perform GEMM using CUDNN? cuDNN	4	2317	February 18, 2023
cuDNN examples cuDNN	8	19907	May 2, 2018
cuDNN vs cuBLAS performance on GEMMs GPU-Accelerated Libraries performance , cudnn , cublas , benchmarks	0	150	June 19, 2025
Cudnn convolution is significantly slow cuDNN	3	1255	April 19, 2022
cudnn dilated convolution low efficiency cuDNN	0	479	May 29, 2019
concatenate using cuDNN cuDNN	1	2221	September 11, 2018
Which algo should be passed for cudnnConvolutionForward() when TensorCore and NHWC ? cuDNN	1	1382	October 25, 2018
Best practice of cuDNN implementation cuDNN	1	561	February 15, 2021

Fully connected layer using cuDNN library

Related topics