Model timing impacted when used Both DLA & GPU simultaneously

vkakula · November 25, 2022, 6:18pm

Hi,
I have created 3 TensorRT model on Xavier NX, let’s call them M1,M2,M3.
M1 is generated for DLA 0 using --useDLACore=0
M2 is generated for DLA 1 using – =1
M3 is generated for GPU.

When I am running only M1 inference it took an average of 56 ms per image.
When I am running only M2 inference it took an average of 56 ms per image.
When I am running only M3 inference it took an average of 31 ms per image.

But When I am running M1 & M2 simultaneously, there was little impact on timing and they took 60 ms for each inference.
When I am running M1 & M3 simultaneously, there was also an impact on timing and they took 62 ms & 49 ms respectively for each inference.
When I am running M2 & M3 simultaneously, there was also an impact on timing and they took 62 ms & 49 ms respectively for each inference.

But the main thing was when I am running M1, M2, and M3 all simultaneously, there was a huge impact on M3 inference time, i.e, M1 took 66 ms, M2 took 66 ms & majorly impacted M3 took 77 to 80 ms.

As I am running M1 & M2 on DLA, if I run them simultaneously they both can have an impact, but when running all 3 models simultaneously M3 which is running altogether on the different device had too much impact.

Can you explain this behavior? I doing anything wrong? if so can you please correct me? It’s urgent for me to know this thing.

biren.doshi82 · November 26, 2022, 12:45pm

Adding more information:
We are seeing low gpu occupancy when DLA is running along with GPU inference. For same model GPU occupancy is high when DLA is not running.

biren.doshi82 · November 27, 2022, 3:45pm

Adding more information:
Looks like GPU starvation is happening when 2 DLA is running.

AastaLLL · November 28, 2022, 2:17am

Hi,

Please increase below environment variable to see if it helps.

$ CUDA_DEVICE_MAX_CONNECTIONS=32

Thanks.

biren.doshi82 · November 28, 2022, 2:52am

Thanks AastaLLL
We are seeing improvement. We will do detail profiling and update you.

system · December 28, 2022, 1:36am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
DLA and GPU cores at the same time Jetson AGX Xavier dla	20	10695	October 18, 2021
DLA and GPU running at the same time, performance degradation Jetson Xavier NX dla	2	701	October 18, 2021
DLA and GPU running at the same time - performance question Jetson AGX Xavier nvbugs , performance , dla	24	3369	October 18, 2021
When GPU and DLA are used at the same time, the time consumption increases with each other DRIVE AGX Orin General dla , driveos-dl	10	984	March 9, 2023
How does the TRT inference run on both DLA and GPUs? Jetson Orin NX tensorrt , dla	2	947	August 30, 2023
Does DLA work faster than GPU in fp16 model? Jetson AGX Xavier dla	18	3175	June 8, 2022
Run GPU and DLAs concurrently Jetson AGX Xavier dla	4	729	October 18, 2021
Use both DLA with NvInfer at the same time in the same process Jetson AGX Xavier dla	12	1150	October 18, 2021
Multiple models on DLAs in AGX Xavier 32TOPs Jetson AGX Xavier	13	1525	October 18, 2021
Unexpected performance loss when using GPU, DLA0, DLA1 simultaneously Jetson Xavier NX tensorrt	6	1186	October 18, 2021

Model timing impacted when used Both DLA & GPU simultaneously

Related topics