Hi,
I understand that my Tensorflow model should run faster on Jetson TX2 using TensorRT.
But after converting my TF model to TensorRT I found out that the inference time is slower with TensorRT engine, 80 ms instead of 20 ms.
My net:
1 Input1 of 1,448,576
1 Input1 of 1,448,576
1 output of 5,233,297
After converting to uff I run this function once:
def preprare_inference(self, channel_size, height, width, batch_size):
# Allocate pagelocked memory
self.output = pycuda.pagelocked_empty(5 * 233 * 297, dtype=np.float32)
# alocate device memory
self.d_input1 = pycuda.mem_alloc(1 * 448 * 576 * 4)
self.d_input2 = pycuda.mem_alloc(1 * 448 * 576 * 4)
self.d_output = pycuda.mem_alloc(1 * 5 * 233 * 297 * 4)
self.stream = cuda.Stream()
self.bindings = [int(self.d_input1), int(self.d_input2), int(self.d_output)]
and run with the following code
def do_infer(self, input1, input2):
input1 = input1.astype(np.float32)
input2 = input2.astype(np.float32)
cuda.memcpy_htod_async(self.d_input1, input1,self.stream)
cuda.memcpy_htod_async(self.d_input2, input2,self.stream)
# execute model
self.context.enqueue(1, self.bindings, self.stream.handle, None)
# transfer predictions back
cuda.memcpy_dtoh(self.output, self.d_output)
return np.reshape(self.output, (5, 233, 297))
Can you please help me understand how is it possible?
Thanks