Plugin layer is much slower than caffe

Hi,

I am porting a caffe upsample layer (https://github.com/BVLC/caffe/pull/6384/files) to a tensorRT plugin.

However, the speed is very slow as compared to caffe.

most of gpu code is identical with caffe.

When upsample input is 256 * 40 * 52 (CHW), the processed time is 19.116ms in TensorRT.

However, the processed time with the same input volume in caffe is only 0.03ms.

this is a huge speed gap.

Any idea?