inference time of tensorrt is slower than tensorflow !!!

michael4e2ca · August 7, 2018, 5:33pm

Hi,
After converting my TF model to TensorRT (Not easy), I found out that the inference time is much slower with TensorRT engine, 80 ms instead of 20 ms. With this I’m not sure it is worth going embedded at all!!

I’m running TensorRT 4 with cudnn 7.1.3 and tensorflow 1.8 (for comparison) on GeForce 1080TI cuda 9.0

Proprietary net:
1 Input1 of 1,448,576
1 Input1 of 1,448,576
1 output of 5,233,297
(NCWH)

After converting to uff I run this function once:

def preprare_inference(self, channel_size, height, width, batch_size):
        # Allocate pagelocked memory
        self.output = pycuda.pagelocked_empty(5 * 233 * 297, dtype=np.float32)
        # alocate device memory
        self.d_input1 = pycuda.mem_alloc(1 * 448 * 576 * 4)
        self.d_input2 = pycuda.mem_alloc(1 * 448 * 576 * 4)
        self.d_output = pycuda.mem_alloc(1 * 5 * 233 * 297 * 4)

        self.stream = cuda.Stream()
        self.bindings = [int(self.d_input1), int(self.d_input2), int(self.d_output)]

and run with the following code

def do_infer(self, input1, input2):
        input1 = input1.astype(np.float32)
        input2 = input2.astype(np.float32)
        cuda.memcpy_htod_async(self.d_input1, input1,self.stream)
        cuda.memcpy_htod_async(self.d_input2, input2,self.stream)

        # execute model
        self.context.enqueue(1, self.bindings, self.stream.handle, None)
        # transfer predictions back
        cuda.memcpy_dtoh(self.output, self.d_output)

        return np.reshape(self.output, (5, 233, 297))

Any clue?
Thanks

along819 · October 16, 2018, 8:56am

I’m not familiar with Python interface, my experience was on Caffe with C++ interface.

So I’m not sure about your case, just remind here based on my own mistakes:
I used to forget to use cudaEventSynchronize() after Caffe’s forward() to synchronize the inference function, which dramatically underestimate Caffe’s inference time. Of course, to measure TensorRT correctly you’d sync its context execution, too.

Based on these, I have correct comparison result. Before fix this, I had mistaken the time comparison as 2ms vs. 20+ms, which means TensorRT was 10+ times slower and seemed impossible.

Rengan · September 27, 2019, 4:27am

Hi @ michael4e2ca, how did you fix your performance issue? I think I have the same issue now.

Topic		Replies	Views
TensorRT inference Time TensorRT	1	759	September 20, 2018
TensorRT inference time much slower than cuDNN TensorRT	3	2022	October 12, 2021
TensorRT inference time extremely slow TensorRT	1	456	January 31, 2023
The first inference using tensorRT model takes far longer time than that using tensorflow model TensorRT	0	662	November 13, 2020
There is a difference in inference speed in TensorRT 8 TensorRT tensorrt	4	506	October 28, 2021
Using tensorRT to accelerate caffe model， but it take more time to inference Jetson TX2	6	495	October 18, 2021
Tensorrt inference slower than tensorflow TensorRT	3	490	November 27, 2020
TensorRT copy data cost a lot of time TensorRT	1	641	April 8, 2020
I found that using tensorrt for inference takes more time than using tensorflow directly on GPU TensorRT	1	750	April 9, 2019
inference time of UFF using tensorrt is slower than tensorflow Jetson TX2	9	2757	October 18, 2021

inference time of tensorrt is slower than tensorflow !!!

Related topics