cuda::DeconvolutionLayer works slowly

Hi. I’m using trtexec with TRT 5 on Xavier to convert a caffe model into a TRT engine.

When I’m profiling the trtexec run (via Nsight), I see that the DeconvolutionLayer takes up much of the processing time, although there are no weights there - only upscaling (by a factor of 2, 4, etc.). Out of hundreds of convolutional layers, the deconvolution ones, of which I have only 2-4, take up 30-50% of the processing time.

Example for the relevant profiling line:
https://i.imgur.com/DxhGc3J.png

My deconvolution (upsample) layer in caffe looks like this:

layer {
  name: "123"
  type: "Deconvolution"
  bottom: "122"
  top: "123"
  convolution_param {
    num_output: 128
    bias_term: false
    kernel_size: 2
    group: 128
    stride: 2
  }
}

I don’t understand why a cudnnConvolutionBackwardData is even called (see image) - shouldn’t this be only forward?
How can I solve this issue?

Thanks.

hi do you have any solutions ?

I’m experiencing same problem here:
https://devtalk.nvidia.com/default/topic/1052490/tensorrt/tensorrt4-convtranspose-layer-very-slow-inference-speed-/