Please provide the following info (tick the boxes after creating this topic): Software Version
DRIVE OS 6.0.5
DRIVE OS 6.0.4 (rev. 1)
[*] DRIVE OS 6.0.4 SDK
other
Target Operating System
[*] Linux
QNX
other
Hardware Platform
DRIVE AGX Orin Developer Kit (940-63710-0010-D00)
DRIVE AGX Orin Developer Kit (940-63710-0010-C00)
DRIVE AGX Orin Developer Kit (not sure its number)
[*] other
SDK Manager Version
1.9.1.10844
[*] other
Host Machine Version
native Ubuntu Linux 20.04 Host installed with SDK Manager
native Ubuntu Linux 20.04 Host installed with DRIVE OS Docker Containers
native Ubuntu Linux 18.04 Host installed with DRIVE OS Docker Containers
[*] other
It takes 45ms to run all the way to the into8 model 0 on GPU 0 alone.
It takes 17ms to run all the way to the int8 model 1 on DLA0 alone (no need to fall back to GPU 0).
It takes 17ms to run the Intel 8 model 2 alone on DLA1 (no need to fall back to GPU0).
I did a couple of experiments:
The time consumption of model 2 is DLA0: “latencyMs”: 17.6779ms, DLA1: “latencyMs”: 18.1355ms
Two DLAs and one GPU run model 0, model 1, model 2, DLA 0: “latencyMs”: 21.8ms, DLA1: “latencyMs”: 21.8ms, GPU0: “latencyMs” 59ms.
One channel DLA and one GPU run model 0, model 1, DLA 0 at the same time: “latencyMs”: 20.78ms, GPU0: “latencyMs” 51.05ms.
All the way DLA runs model 1, model 2, DLA 0: “latencyMs”: 33ms
Experiment 2: Why does the delay between GPU0 and DLA affect each other, and how to avoid such problems.
Experiment 3: Why the latency between GPU0 and a DLA drops.
Experiment 4, why DLA has no parallel processing ability at all, the time to run n models will increase n times, how to make DLA can process multiple models at the same time.
Does DLA share memory bandwidth with the GPU, or are there any other resources that are shared? As a result, orin cannot fully exert the computing power of GPU 167 INT8 TOPS and DLA 87 INT8 TOPS. How do we get all the cores in parallel and not let the delays between them affect each other?
Dear @haihua.wei,
Did you use trtexec tool for this experminet? Just want to double check if you have made sure that GPU falls back is not happening.
Is it possible to share repro steps/models/code?
We can confirm that the model is all running on DLA, the --allowGPUFallback option is not turned on when building the model, and that the runtime sequence of the model that has been tracked using nsys profile is all run on DLA. @SivaRamaKrishnaNV
Dear @haihua.wei,
iGPU and DLA share same scheduler, memory resources. Though the task gets executed on separate HW(i.e iGPU and DLA), DLA will have a seperate GPU context to register the task finish signal from DLA. So the context switch between the two GPU context induce more latency in the pipeline. But the over all execution time can be less when run in parallel.
Using different trtexec process creates multiple GPU context. You can try launching the models from a single process so that multiple GPU contexts can be avoided. Even with single context, some delay is expected.
Currently, DLA can not run models in parallel. We are working on improving the GPU+DLA scenario to be efficiently. Using cuDLA library directly also an option to schedule work on DLA optimally. But cuDLA is not part Devzone release. You need to contact your NVIDIA representative in case you need access.