TensorRT inference time much slower than cuDNN

along819 · October 16, 2018, 4:39am

Hi,

Previously I was using a CNN model on Caffe which had been built with cuDNN, the inference time is about 2ms. When I tried to optimize it using TensorRT, it turned out to become much slower, as the infer time rise to about 28ms!
The precision is the default FP32.

Environment:
Win10 64-bit, Geforce 1080Ti, very recent driver
CUDA10, cuDNN7.3.1, TensorRT 5RC for Windows

Acutally in the same way and under the same environment, I tested sample_MNIST, the Caffe time is about 0.16ms while TensorRT time is 0.1ms (in fp32), which showed the optimization did work.

Is there anything on clue for this?

thanks!
Charles

NVES · October 16, 2018, 7:07pm

Hello,

it’d help us debug if you can provide a small reproduction package that includes the source,model,and dataset.

along819 · October 17, 2018, 2:22am

Thanks for help, but it seems no issue now.
I’ve just checked the source code of Caffe’s time command, and found the cause of the observation:
If in GPU mode, Caffe would use cudaEventSynchronize() after each layer’s forward() function to synchronize between host and device, so that the calculation of duration of forward pass is proper.

So, I found my previous fault is not to add a cudaEventSynchronize() in my code after calling caffeNet->forward() to synchronize the GPU inference, which dramatically underestimate Caffe’s inference time. Hence, the statistics comparison to TensorRT is actually invalid.

Based on above fix, I have correct comparison result, which shows Caffe inference time is also ~30ms and TensorRT is actually a little bit faster than this, though I would like to further boost performance using INT8 or so.

regards,
Charles

Topic		Replies	Views
inference time of tensorrt is slower than tensorflow !!! TensorRT	2	1433	September 27, 2019
TensorRT inference time extremely slow TensorRT	1	449	January 31, 2023
Using tensorRT to accelerate caffe model， but it take more time to inference Jetson TX2	6	491	October 18, 2021
TensorRT inference time much faster than cuDNN TensorRT	5	1612	February 22, 2022
Inference Speed Spikes When Running FP16 Converted ONNX Model with TensorRT TensorRT cudnn	1	44	January 31, 2025
Inference time measure. TensorRT	2	1434	October 16, 2018
Inference time mismatch between same configuration on Windows and Ubuntu TensorRT tensorrt , windows-driver	2	659	September 27, 2023
INT8 (8-bit inference, post-training quantization) on Windows 10 is much slower than Ubuntu 20.04 TensorRT	5	739	September 23, 2022
Inference on large batch size TensorRT	5	4595	September 21, 2018
Tensorrt is slower than pytorch TensorRT	2	2226	September 15, 2021

TensorRT inference time much slower than cuDNN

Related topics