I am trying to migrate MobileNetV2 to AGX Xavier with DLA conversion.
I just removed the end of model because of the tensorrt conversion error about GlobalAveragePool and Gemm layer.
Then I successfully convert MobileNetV2 to rtr files just for DLA and GPU with fp16 format.
but the latency time seems to be weird
DLA 16fp : 6.25089 ms
GPU 16fp : 2.88255 ms
DLA only model must be faster than GPU only, isn’t it?
Q: Why does my network run slower when using DLA compared to without DLA?
A: DLA was designed to maximize energy efficiency. Depending on the features supported by DLA and the features supported by the GPU, either implementation can be more performant. Which implementation to use depends on your latency or throughput requirements and your power budget. Since all DLA engines are independent of the GPU and each other, you could also use both implementations at the same time to further increase the throughput of your network.
For the performance issue, would you mind sharing the log with -dumpProfile with us?
Is DLA mainly for the energy efficiency and finally harmful the latency?
But I have ever read some document that working on DLA is faster than GPU ( unfortunately I can’t search it OTL.)
Would you mind sharing the original ONNX model with us?
Since the TensorRT engine is not portable, this will help us test on different platforms and software versions.
Confirmed that we can reproduce the same performance issue internally.
We are checking this with our internal team. Will share more information with you later.
At the same time I also measured some values with following commands
$ time runTest.sh & runTest.sh & runTest.sh & runTest.sh &
It can be to run 4 processes simultaneously.
There are two DLAs hardware but only one GPU on Xavier.
So multithread will benefit DLA since two jobs can run on different DLAs without sharing resources.