Plugin layer is much slower than caffe


I am porting a caffe upsample layer ( to a tensorRT plugin.

However, the speed is very slow as compared to caffe.

most of gpu code is identical with caffe.

When upsample input is 256 * 40 * 52 (CHW), the processed time is 19.116ms in TensorRT.

However, the processed time with the same input volume in caffe is only 0.03ms.

this is a huge speed gap.

Any idea?