Hi guys, I’m testing the performance number on Orin with MAXN mode. The pure conv2d model got result as below: a. Run GPU only: 405.3 qps. b. Run DLA only: 132.7 qps c. Run GPU + DLA x2: GPU 229.2 qps +DLS 113.1 qps * 2 = 455.4 qps in total It seems that the GPU turns slow if we use GPU and DLA…

Run pure conv2d node on DLA makes GPU get slower

AastaLLL July 12, 2022, 6:43am 10

Hi,

We also found some performance issues when running 2x DLA and GPU concurrently.

The root cause is related to memory bandwidth and GPU scheduling.
However, we are not able to disclose the detail here.

The implementation for fixing this issue is available internally.
It will be included in our future release (not 5.0 GA).

Thanks.

3 Likes

Trtexec fails with null pointer exception when useDLACore enabled

Topic		Replies	Views
How to boost trtexec's gps for 1DLA only? Jetson AGX Orin jetson-inference , dla	16	1129	April 26, 2023
DLA performance is not as expected Jetson AGX Orin dla	7	256	August 14, 2024
GeMM performance on Orin DLA Jetson AGX Orin tensorrt , cuda , jetson-inference	10	911	February 21, 2024
Does DLA work faster than GPU in fp16 model? Jetson AGX Xavier dla	18	2734	June 8, 2022
DLA and GPU cores at the same time Jetson AGX Xavier dla	20	10270	October 18, 2021
DLA and GPU running at the same time - performance question Jetson AGX Xavier nvbugs , performance , dla	24	3148	October 18, 2021
AGX Orin DeepStream 6.1.1 FPS of deepstream-app use DLA low than AGX Xavier DeepStream SDK	19	894	November 2, 2022
TensorRT model inference fully on DLA is slow due to abnormally slow cudaEventSynchronize time Jetson AGX Orin tensorrt , cuda , dla	10	1538	January 17, 2024
DLA cudlaSubmitTask function takes too long (12ms+) Jetson AGX Orin tensorrt	6	311	April 11, 2024
Jetson Orin: All layers pushed to GPU, zero layers on DLA Jetson AGX Orin tensorrt , dla	7	1040	April 26, 2023