@SunilJB Thanks for the reply.
I found the following post: Questions about efficient memory management for TensorRT on TX2 - #6 by Beerend
Here the OP suggested:
void ObjectDetector::runInference() {
util::Logger log("ObjectDetector::runInference");
trt_context->enqueue(batch_size, &trt_input_gpu, cuda_stream, nullptr);
cudaStreamSynchronize(cuda_stream);
cudaDeviceSynchronize();
}
Would this be the correct implementation? How would the context know how many elements to consider as the input tensor?