Hi,
I am porting a caffe upsample layer (add efficient upsample layer by twmht · Pull Request #6384 · BVLC/caffe · GitHub) to a tensorRT plugin.
However, the speed is very slow as compared to caffe.
most of gpu code is identical with caffe.
When upsample input is 256 * 40 * 52 (CHW), the processed time is 19.116ms in TensorRT.
However, the processed time with the same input volume in caffe is only 0.03ms.
this is a huge speed gap.
Any idea?