TensorRT softmax layer very slow

LiranBachar · January 30, 2018, 12:16pm

I used softmax layer in my prototxt and ran it using TensorRT.
It seems to run about 10x slower than caffe softmax layer.

TensorRT 3.0.1
Cudnn 7.0.5
Ubuntu 16.04
GTX 1070

layer {
  name: "mbox_conf_softmax"
  type: "Softmax"
  bottom: "mbox_conf_reshape"
  top: "mbox_conf_softmax"
  softmax_param {
    axis: 2
  }
}

Overall this layer takes more than 50% of the model runtime while in caffe it hardly affects inference time.

thomas.deegan · January 30, 2018, 7:20pm

How did you measure this?

LiranBachar · January 31, 2018, 10:09am

I compared the following caffe prototxt in caffe time and giexec

name: "sample"
input: "data"
input_shape {
  dim: 1
  dim: 2436
  dim: 4
  dim: 1
}
layer {
  name: "softmax"
  type: "Softmax"
  bottom: "data"
  top: "softmax"
  softmax_param {
    axis: 2
  }
}

./caffe time --model softmax.prototxt -gpu 0
shows “Average Forward pass: 0.0410502 ms.”

./giexec --deploy=softmax.prototxt --output=softmax
shows Average over 10 runs is 0.958691 ms.

I tried nvvp with giexec and I see a single kernel cudnn::detail::softmax_fw_channel_4d_kernel which takes 0.9ms to finish.

LiranBachar · February 5, 2018, 8:15am

Also noticed TensorRT softmax layer ignoes axis parameter

Topic		Replies	Views
TensorRT inference time much slower than cuDNN TensorRT	3	2017	October 12, 2021
Deconvolution Layer runs super slow in TensorRT TensorRT	1	1603	August 2, 2018
Plugin layer is much slower than caffe TensorRT	0	553	June 11, 2018
Softmax layer in Tensorrt7.0 has wrong inference results TensorRT	6	1959	January 14, 2021
Cudnn may be slower? GPU-Accelerated Libraries	3	2642	September 28, 2015
when batchsize>1, the inference performance used by tensorrt 2.1 is lower than used by caffe TensorRT	0	473	May 15, 2018
tensorrt and caffe output have large difference with probability of a few percent TensorRT	1	650	April 12, 2019
cuda::DeconvolutionLayer works slowly TensorRT	1	577	May 27, 2019
TensorRT - Deconvolution layer slow inference DriveWorks	7	1892	October 9, 2018
Tensorrt is slower than pytorch TensorRT	2	2226	September 15, 2021

TensorRT softmax layer very slow

Related topics