inference time of tensorrt is slower than tensorflow !!!

After converting my TF model to TensorRT (Not easy), I found out that the inference time is much slower with TensorRT engine, 80 ms instead of 20 ms. With this I’m not sure it is worth going embedded at all!!

I’m running TensorRT 4 with cudnn 7.1.3 and tensorflow 1.8 (for comparison) on GeForce 1080TI cuda 9.0

Proprietary net:
1 Input1 of 1,448,576
1 Input1 of 1,448,576
1 output of 5,233,297

After converting to uff I run this function once:

def preprare_inference(self, channel_size, height, width, batch_size):
        # Allocate pagelocked memory
        self.output = pycuda.pagelocked_empty(5 * 233 * 297, dtype=np.float32)
        # alocate device memory
        self.d_input1 = pycuda.mem_alloc(1 * 448 * 576 * 4)
        self.d_input2 = pycuda.mem_alloc(1 * 448 * 576 * 4)
        self.d_output = pycuda.mem_alloc(1 * 5 * 233 * 297 * 4) = cuda.Stream()
        self.bindings = [int(self.d_input1), int(self.d_input2), int(self.d_output)]

and run with the following code

def do_infer(self, input1, input2):
        input1 = input1.astype(np.float32)
        input2 = input2.astype(np.float32)
        cuda.memcpy_htod_async(self.d_input1, input1,
        cuda.memcpy_htod_async(self.d_input2, input2,

        # execute model
        self.context.enqueue(1, self.bindings,, None)
        # transfer predictions back
        cuda.memcpy_dtoh(self.output, self.d_output)

        return np.reshape(self.output, (5, 233, 297))

Any clue?

I’m not familiar with Python interface, my experience was on Caffe with C++ interface.

So I’m not sure about your case, just remind here based on my own mistakes:
I used to forget to use cudaEventSynchronize() after Caffe’s forward() to synchronize the inference function, which dramatically underestimate Caffe’s inference time. Of course, to measure TensorRT correctly you’d sync its context execution, too.

Based on these, I have correct comparison result. Before fix this, I had mistaken the time comparison as 2ms vs. 20+ms, which means TensorRT was 10+ times slower and seemed impossible.

Hi @ michael4e2ca, how did you fix your performance issue? I think I have the same issue now.