Currently we are downloading pretrained Resnet50 weights from keras and converted into onnx with multiple batches, after the conversion using below syntax we have converted into tensorRT
From the above table, with GPU results are quite acceptable but with DLAs showing very low results. Moreover, we have seen this pattern with other tensorflow models (mobilenet, ssd-mobilenet, vgg etc) just wanted to know why it is giving very less throughput.
Can you please suggest why we are observing less throughput with DLA ?
Q: Why does my network run slower when using DLA compared to without DLA?
A: DLA was designed to maximize energy efficiency. Depending on the features supported by DLA and the features supported by the GPU, either implementation can be more performant. Which implementation to use depends on your latency or throughput requirements and your power budget. Since all DLA engines are independent of the GPU and each other, you could also use both implementations at the same time to further increase the throughput of your network.