Streaming with GPU and DLA

ashiyana · January 13, 2025, 5:25pm

Hello,
I created a plan where the first few layers are run in DLA and the remaining layers are run in GPU. I wanted to know how I can use this plan for streaming data in parallel. What I mean is this:
The first image is processed by DLA then GPU. When the first image is in the GPU, the second image is in the DLA. When the second image reaches the GPU, the third image enters the DLA, and so on. Essentially, after the first instance, at any point, I am using the DLA and GPU at the same time. Like the image below:

How can I do this, and which application should I use to prove that it is working in parallel? (Note that both the output and input are images)

I tried using nsys to see the output for the plan. Following the documentation (Developer Guide :: NVIDIA Deep Learning TensorRT Documentation), I do not see the other accelerators API in the output. I have added the link to the files as well.
Link to files

Thank you.

AastaLLL · January 14, 2025, 5:44am

Hi,

You will need to use TensorRT API to specify certain layers run on DLA:

Or use cuDLA library:

Thanks.

ashiyana · January 14, 2025, 5:52am

Hello,
Thank you for the response.
I have used TensorRT Python library to set the device type to DLA for the first few layers and created the plan from this. What I would like to know is how to achieve the streaming step where I am able to execute the model in parallel. My input and output are images.

AastaLLL · January 15, 2025, 6:18am

Hi,

Please run the multiple models within the same process but with different threads and CUDA streams.
GPU scheduler will optimize the tasks based on resources.

Thanks.

ashiyana · January 15, 2025, 7:10am

I am not sure I understood. Do you have a sample file that I could follow to do this?

AastaLLL · January 16, 2025, 8:05am

Hi,

Please find the sample below:

Thanks.

ashiyana · January 17, 2025, 7:57am

Thank you for the sample. I followed the documentation to get the profiling data along with the sample and got the following results: (I used a plan that had the first ten layers executed in DLA and the remaining in GPU.)

Without streaming.

Screenshot from 2025-01-17 10-21-591040×377 55.4 KB

no_streaming.txt (23.4 KB)
With streaming of 2.

Screenshot from 2025-01-17 10-14-501039×287 37.5 KB

with_streaming.txt (28.5 KB)

I have also attached the logs from execution.

I can see an improvement in the throughput, but I cannot see how it is executing in parallel like I showed previously. The Nsight Systems shows a serial execution in both cases. Is there something I am missing?

ashiyana · January 17, 2025, 8:00am

Also is it possible to use a folder containing all the images as input for the model during while using trtexec? From what I understood from the sample, I can only use a single file that has been converted to a binary format.

Topic		Replies	Views
DLA and GPU cores at the same time Jetson AGX Xavier dla	20	10170	October 18, 2021
Some question about using dual DLA of jetson xavier nx DeepStream SDK	8	2217	October 12, 2021
DLA and GPU running at the same time - performance question Jetson AGX Xavier nvbugs , performance , dla	24	3119	October 18, 2021
Run a part of DNN on DLA and part of DNN on GPU Jetson AGX Xavier dla	7	1168	February 14, 2023
CUDA threading in Jetson Xavier separately Jetson Xavier NX cuda	10	1604	February 2, 2022
Can NVDLA and GPU work in parallel? TensorRT	3	2103	October 12, 2021
Using GPU + DLA for maximum no. streams DeepStream SDK	2	455	October 12, 2021
How to Execute both Deep Learning Accelerator(DLA) and GPU at the same time in python Jetson Nano tensorrt , jetson-inference , python , dla	3	781	April 26, 2023
DLA / GPU question Jetson AGX Xavier dla	6	932	October 18, 2021
Use both DLA with NvInfer at the same time in the same process Jetson AGX Xavier dla	12	1000	October 18, 2021

Streaming with GPU and DLA

Related topics