How to use cudnn backend to train a cnn network with relu or bn layers?

recently, we a trying to train a mobilenet v1 network with cudnn backend
but we found 2 problems :

  1. we could not find explicit support for batchnormalization layer. the only way we found is pointwise
  2. we could not find explicit support for relu backward, we could not acquire mask of relu fwd via any option

Hi @896849432 ,
Can you please refer to the below links and see if they helps?API Reference :: NVIDIA Deep Learning cuDNN Documentation
API Reference :: NVIDIA Deep Learning cuDNN Documentation


i have scanned the docs, but how can i put these bn opts into operation graph of cudnn-backend?

Hi @896849432 , thanks for the questions

  1. currently you would need to use the legacy API for BN. It has not been added to the v8 backend API yet
  2. From the API it’s possible to add relu_backward into the operation graph, it requires the original input of the relu fwd to be passed in where the mask is re-computed from the tensor. See the sample code below

CHECK_ERROR(cudnnBackendSetAttribute(opDesc, CUDNN_ATTR_OPERATION_POINTWISE_ALPHA1, CUDNN_TYPE_DOUBLE, 1, &(this->alpha1) /*has to be 1.0 for fusion*/));
CHECK_ERROR(cudnnBackendSetAttribute(opDesc, CUDNN_ATTR_OPERATION_POINTWISE_ALPHA2, CUDNN_TYPE_DOUBLE, 1, &(this->alpha2) /*has to be 1.0 for fusion*/));


However for mobilenet currently we are not able to fuse dep-sep convolutions with relu, so you would need to create a single-operation graph for convolution, and cannot add the relu in it. for activation for now you will have to use the legacy cudnnActivationForward/Backward API.

thanks a lot for your answering

thanks a lot for your answering,
is there any way to create a custom operator (in this operator, i could call bn legacy api) like custom plugin in tensorrt?

Hi @896849432 how do you plan to use cuDNN, is it through DL frameworks or through your own code?
We currently don’t support customized ops like what you have described. Would you be able to change the upper level code to call into the legacy API for the BN nodes in the graph?

My understanding is TRT is more like a DL framework and sees the global graph, and tries to do global optimizations on the graph partitions, that’s why users may need plug-ins to run customized code for certain parts of the graph. For cuDNN since we don’t target global graph optimizations and expect the caller to do the graph partitioning and decide where to lower each part of the graph to, we haven’t thought about supporting plug-ins for a subgraph like that. It would be very interesting to hear about your use cases.

thanks ,
currently, we use cudnn by calling api for each layer,
and we were trying to speedup our trainning via cudnn backend
we used to assume that cudnn backend worked like trt, so we suppose that we can speedup whole training if we transform whole or most part of the network to an opt-graph
and now, we are going to give up usage of backend and adapt to different algos of cudnnConv instead

Hi @896849432 if you can provide more details of what your graph looks like, we might be able to provide more suggestions of how you can optimize the graph. There are operation graph patterns that can be fused though the backend API to speed up the training, just it doesn’t support arbitrary fusion.