Hello,
I am testing the performance of TensorRT in a Jetson OrinNX. To do so, I created a small program to run the inference of a neural network. Before running my program, I export my onnx model to engine.
Reading the documentation, I see that the inference can be run in the Deep Learning Accelerator (DLA) cores. I would expect the inference to run faster in DLA core than in GPU, but that is not my case.
Is my assumption wrong? Could it be because of something related to my Neural Network? Am I missing something related to TensorRT?
In particular, when I run the inference in GPU, the elapsed time is 13 ms but in DLA cores the elapsed time is 30 ms
Hi,
Please find the explanation in our document below:
https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-1030/developer-guide/index.html#troubleshooting
Q: Why does my network run slower when using DLA than without DLA?
A: DLA was designed to maximize energy efficiency. Depending on the features supported by DLA and the features supported by the GPU, either implementation can be more performant. Your chosen implementation depends on your latency or throughput requirements and power budget. Since all DLA engines are independent of the GPU and each other, you could also use both implementations to increase the throughput of your network further.
Thanks.