TensorRT inference Time

Hi,
I understand that my Tensorflow model should run faster on Jetson TX2 using TensorRT.
But after converting my TF model to TensorRT I found out that the inference time is slower with TensorRT engine, 80 ms instead of 20 ms.

My net:
1 Input1 of 1,448,576
1 Input1 of 1,448,576
1 output of 5,233,297

After converting to uff I run this function once:

def preprare_inference(self, channel_size, height, width, batch_size):
            # Allocate pagelocked memory
            self.output = pycuda.pagelocked_empty(5 * 233 * 297, dtype=np.float32)
            # alocate device memory
            self.d_input1 = pycuda.mem_alloc(1 * 448 * 576 * 4)
            self.d_input2 = pycuda.mem_alloc(1 * 448 * 576 * 4)
            self.d_output = pycuda.mem_alloc(1 * 5 * 233 * 297 * 4)

            self.stream = cuda.Stream()
            self.bindings = [int(self.d_input1), int(self.d_input2), int(self.d_output)]

and run with the following code

def do_infer(self, input1, input2):
            input1 = input1.astype(np.float32)
            input2 = input2.astype(np.float32)
            cuda.memcpy_htod_async(self.d_input1, input1,self.stream)
            cuda.memcpy_htod_async(self.d_input2, input2,self.stream)

            # execute model
            self.context.enqueue(1, self.bindings, self.stream.handle, None)
            # transfer predictions back
            cuda.memcpy_dtoh(self.output, self.d_output)

            return np.reshape(self.output, (5, 233, 297))

Can you please help me understand how is it possible?
Thanks

Hello,

can you share the UFF with us? What version of TF and TRT are you using?

thanks