Hi,
I got a problem porting a Caffe model to TensorRT. All the other layers work great. But the last deconvolution layer runs super slow. Here’s the profiling data of the model running with TensorRT.
conv1 0.088ms
relu_conv1 0.081ms
conv2 0.275ms
relu_conv2 0.082ms
conv3_1 0.274ms
relu_conv3_1 0.061ms
conv3_2 0.272ms
relu_conv3_2 0.041ms
conv3_3 0.198ms
relu_conv3_3 0.081ms
slice1 0.084ms
conv3_4 0.231ms
relu_conv3_4 0.082ms
conv3_5 0.274ms
relu_conv3_5 0.076ms
conv3_6 0.373ms
relu_conv3_6 0.101ms
conv2 copy 0.082ms
slice1_1 copy 0.024ms
sum1 0.149ms
down1 0.115ms
relu_down1 0.080ms
conv4_1 0.272ms
relu_conv4_1 0.061ms
conv4_2 0.270ms
relu_conv4_2 0.041ms
conv4_3 0.201ms
relu_conv4_3 0.081ms
slice2 0.084ms
conv4_4 0.231ms
relu_conv4_4 0.108ms
conv4_5 0.297ms
relu_conv4_5 0.061ms
conv4_6 0.323ms
relu_conv4_6 0.101ms
down1 copy 0.082ms
slice2_1 copy 0.023ms
sum2 0.150ms
down2 0.115ms
relu_down2 0.080ms
conv5_1 0.275ms
relu_conv5_1 0.061ms
conv5_2 0.270ms
relu_conv5_2 0.041ms
conv5_3 0.201ms
relu_conv5_3 0.081ms
slice3 0.084ms
conv5_4 0.229ms
relu_conv5_4 0.081ms
conv5_5 0.273ms
relu_conv5_5 0.061ms
conv5_6 0.324ms
relu_conv5_6 0.101ms
down2 copy 0.082ms
slice3_1 copy 0.023ms
sum3 0.150ms
down3 0.114ms
relu_down3 0.080ms
conv6_1 0.274ms
relu_conv6_1 0.061ms
conv6_2 0.315ms
relu_conv6_2 0.072ms
conv6_3 0.200ms
relu_conv6_3 0.081ms
slice4 0.083ms
conv6_4 0.230ms
relu_conv6_4 0.081ms
conv6_5 0.273ms
relu_conv6_5 0.061ms
conv6_6 0.324ms
relu_conv6_6 0.101ms
down3 copy 0.082ms
slice4_1 copy 0.023ms
sum4 0.150ms
down4 0.115ms
relu_down4 0.080ms
upsample 224.982ms
Time over all layers: 235.824
Here’s the Caffe prototxt of this last deconv layer.
layer {
name: "upsample"
type: "Deconvolution"
bottom: "down4"
top: "upsample"
convolution_param {
kernel_size: 17
stride: 2
num_output: 1
pad: 8
}
}
I’m using a Nvidia Titan XP GPU. This model takes around 16ms running with Caffe (without TensorRT acceleration.) So there’s no reason to have more than 200ms latency. Can anyone help me with this issue. Thanks.