Direct GPU Inference

@SunilJB Thanks for the reply.

I found the following post: Questions about efficient memory management for TensorRT on TX2 - #6 by Beerend

Here the OP suggested:

void ObjectDetector::runInference() {
util::Logger log("ObjectDetector::runInference");

trt_context->enqueue(batch_size, &trt_input_gpu, cuda_stream, nullptr);

cudaStreamSynchronize(cuda_stream);
cudaDeviceSynchronize();
}

Would this be the correct implementation? How would the context know how many elements to consider as the input tensor?