Currently we are downloading pretrained Resnet50 weights from keras and converted into onnx with multiple batches, after the conversion using below syntax we have converted into tensorRT
/usr/src/tensorrt/bin/trtexec --onnx=onnx_model.onnx --saveEngine=resnet50.trt --explicitBatch --inputIOFormats=int8:chw --outputIOFormats=int8:chw --int8 --useDLACore=0 --allowGPUFallback=True --sparsity=disable --verbose=True
We have prepared 2 models - one with GPU, and with DLA.
After inferencing we have collected the below results
From the above table, with GPU results are quite acceptable but with DLAs showing very low results. Moreover, we have seen this pattern with other tensorflow models (mobilenet, ssd-mobilenet, vgg etc) just wanted to know why it is giving very less throughput.
Can you please suggest why we are observing less throughput with DLA ?
We are moving this post to the Jetson AGX Orin forum to get better help.
I have this same issue with a different model. Same exact model, only difference is whether or not I added
In FP16, it is ~10X slower to use the DLA in my case.
Do you want to compare the performance between GPU and DLA?
Please find information in our document below:
Q: Why does my network run slower when using DLA compared to without DLA?
A: DLA was designed to maximize energy efficiency. Depending on the features supported by DLA and the features supported by the GPU, either implementation can be more performant. Which implementation to use depends on your latency or throughput requirements and your power budget. Since all DLA engines are independent of the GPU and each other, you could also use both implementations at the same time to further increase the throughput of your network.