Hi,
After converting my TF model to TensorRT (Not easy), I found out that the inference time is much slower with TensorRT engine, 80 ms instead of 20 ms. With this I’m not sure it is worth going embedded at all!!
I’m running TensorRT 4 with cudnn 7.1.3 and tensorflow 1.8 (for comparison) on GeForce 1080TI cuda 9.0
Proprietary net:
1 Input1 of 1,448,576
1 Input1 of 1,448,576
1 output of 5,233,297
(NCWH)
After converting to uff I run this function once:
def preprare_inference(self, channel_size, height, width, batch_size):
# Allocate pagelocked memory
self.output = pycuda.pagelocked_empty(5 * 233 * 297, dtype=np.float32)
# alocate device memory
self.d_input1 = pycuda.mem_alloc(1 * 448 * 576 * 4)
self.d_input2 = pycuda.mem_alloc(1 * 448 * 576 * 4)
self.d_output = pycuda.mem_alloc(1 * 5 * 233 * 297 * 4)
self.stream = cuda.Stream()
self.bindings = [int(self.d_input1), int(self.d_input2), int(self.d_output)]
and run with the following code
def do_infer(self, input1, input2):
input1 = input1.astype(np.float32)
input2 = input2.astype(np.float32)
cuda.memcpy_htod_async(self.d_input1, input1,self.stream)
cuda.memcpy_htod_async(self.d_input2, input2,self.stream)
# execute model
self.context.enqueue(1, self.bindings, self.stream.handle, None)
# transfer predictions back
cuda.memcpy_dtoh(self.output, self.d_output)
return np.reshape(self.output, (5, 233, 297))
Any clue?
Thanks