Performance about igpu and dla

Please provide the following info (check/uncheck the boxes after creating this topic):
Software Version
DRIVE OS Linux 5.2.6
DRIVE OS Linux 5.2.0
[yes] DRIVE OS Linux 5.2.0 and DriveWorks 3.5
NVIDIA DRIVE™ Software 10.0 (Linux)
NVIDIA DRIVE™ Software 9.0 (Linux)
other DRIVE OS version
other

Target Operating System
[yes] Linux
QNX
other

Hardware Platform
[yes] NVIDIA DRIVE™ AGX Xavier DevKit (E3550)
NVIDIA DRIVE™ AGX Pegasus DevKit (E3550)
other

SDK Manager Version
[yes] 1.6.0.8170
other

Host Machine Version
[yes] native Ubuntu 18.04
other

Recently, I test the performance of Xavier. I transform a model into igpu, dla0 and dla1.
cost time as following:
igpu1: 105000ms
igpu
2: 183000ms
igpu4: 350000ms
dla0: 245000ms
igpu
1+dla0:324000ms
igpu*2+dla0+dla1: 355000ms

So, firstly, the dla is slower than igpu very much.
Secondly, when I use igpu and dla together for two thread, I need more time than just using igpu.

The layers of my model can not all run in dla, some layers need to fallback to gpu which will affect igpu seriously?

Dear @wang_chen2,
iGPU has more DL Tops(perf) than DLA. We geberally notice iGPU takes less time compared to DLA.
When gpufall is enabled, the layers that can’t run on DLA will move back to iGPU. This involves an additional data transfer of intermidiate layer output which causes increase in overall execution time.

Hi,@SivaRamaKrishnaNV
So, the my cost time is reasonable?
In best case, using dla will not affect igpu?
Are there some suggests for using dla to improve performance?

Hi,

DLA is designed for low power rather than performance.
Usually, we recommend DLA when users want to release GPU or increase throughput.

Not sure if I understand your benchmark result correctly.
When you running GPU along with DLA, you should get 2x throughput although the latency increases.

More, could you try to set the below environment variable to see if it helps?

$ export CUDA_DEVICE_MAX_CONNECTIONS=32

You can find more details about this variable in the below topic:

Thanks.

Hi,
Yes, when I runing GPU along with DLA, I get 2x throughput but it costs more time than only runing GPU at the same throughput.

I export this and there is no change.

Thank you very much.

Hi,

Could you check the GPU fallback ratio of your model?
You can run a model on DLA and monitor the GPU utilization.

If the model depends on GPU a lot, the data transfer between GPU and DLA may cause a performance issue.

Thanks.

Hi, AasttaLLL,
Yes, there are 5 layers need to fallbcak to GPU. I am trying to solve the fallback and then to test the performanc again.

Hi,

Do you want to update the layer into DLA compatible?
This will be a better way to separate the dependency between DLA and GPU.

Thanks.

Yes, I hvae updated the layers into DLA compatible and it works.
Thank you very much.