I am porting a caffe upsample layer (https://github.com/BVLC/caffe/pull/6384/files) to a tensorRT plugin.
However, the speed is very slow as compared to caffe.
most of gpu code is identical with caffe.
When upsample input is 256 * 40 * 52 (CHW), the processed time is 19.116ms in TensorRT.
However, the processed time with the same input volume in caffe is only 0.03ms.
this is a huge speed gap.