TensorRT inference time much slower than cuDNN


Previously I was using a CNN model on Caffe which had been built with cuDNN, the inference time is about 2ms. When I tried to optimize it using TensorRT, it turned out to become much slower, as the infer time rise to about 28ms!
The precision is the default FP32.

Win10 64-bit, Geforce 1080Ti, very recent driver
CUDA10, cuDNN7.3.1, TensorRT 5RC for Windows

Acutally in the same way and under the same environment, I tested sample_MNIST, the Caffe time is about 0.16ms while TensorRT time is 0.1ms (in fp32), which showed the optimization did work.

Is there anything on clue for this?



it’d help us debug if you can provide a small reproduction package that includes the source,model,and dataset.

Thanks for help, but it seems no issue now.
I’ve just checked the source code of Caffe’s time command, and found the cause of the observation:
If in GPU mode, Caffe would use cudaEventSynchronize() after each layer’s forward() function to synchronize between host and device, so that the calculation of duration of forward pass is proper.

So, I found my previous fault is not to add a cudaEventSynchronize() in my code after calling caffeNet->forward() to synchronize the GPU inference, which dramatically underestimate Caffe’s inference time. Hence, the statistics comparison to TensorRT is actually invalid.

Based on above fix, I have correct comparison result, which shows Caffe inference time is also ~30ms and TensorRT is actually a little bit faster than this, though I would like to further boost performance using INT8 or so.