Hello,
I created a plan where the first few layers are run in DLA and the remaining layers are run in GPU. I wanted to know how I can use this plan for streaming data in parallel. What I mean is this:
The first image is processed by DLA then GPU. When the first image is in the GPU, the second image is in the DLA. When the second image reaches the GPU, the third image enters the DLA, and so on. Essentially, after the first instance, at any point, I am using the DLA and GPU at the same time. Like the image below:
Hello,
Thank you for the response.
I have used TensorRT Python library to set the device type to DLA for the first few layers and created the plan from this. What I would like to know is how to achieve the streaming step where I am able to execute the model in parallel. My input and output are images.
Please run the multiple models within the same process but with different threads and CUDA streams.
GPU scheduler will optimize the tasks based on resources.
Thank you for the sample. I followed the documentation to get the profiling data along with the sample and got the following results: (I used a plan that had the first ten layers executed in DLA and the remaining in GPU.)
I can see an improvement in the throughput, but I cannot see how it is executing in parallel like I showed previously. The Nsight Systems shows a serial execution in both cases. Is there something I am missing?
Also is it possible to use a folder containing all the images as input for the model during while using trtexec? From what I understood from the sample, I can only use a single file that has been converted to a binary format.