Hi. I’m using trtexec with TRT 5 on Xavier to convert a caffe model into a TRT engine.
When I’m profiling the trtexec run (via Nsight), I see that the DeconvolutionLayer takes up much of the processing time, although there are no weights there - only upscaling (by a factor of 2, 4, etc.). Out of hundreds of convolutional layers, the deconvolution ones, of which I have only 2-4, take up 30-50% of the processing time.
Example for the relevant profiling line:
https://i.imgur.com/DxhGc3J.png
My deconvolution (upsample) layer in caffe looks like this:
layer {
name: "123"
type: "Deconvolution"
bottom: "122"
top: "123"
convolution_param {
num_output: 128
bias_term: false
kernel_size: 2
group: 128
stride: 2
}
}
I don’t understand why a cudnnConvolutionBackwardData is even called (see image) - shouldn’t this be only forward?
How can I solve this issue?
Thanks.