Hi,
Please noted that TensoRT requires GPU memory for inference.
In your do_inference
function, have you copied the buffer from CPU into GPU.
And copy the output back from GPU to CPU?
You can find an example below:
https://forums.developer.nvidia.com/t/custom-resnet-jetson-xavier/160448/3
...
cuda.memcpy_htod_async(cuda_inputs[0], host_inputs[0], stream)
context.execute_async(bindings=bindings, stream_handle=stream.handle)
cuda.memcpy_dtoh_async(host_outputs[0], cuda_outputs[0], stream)
...
Thanks.