I’m working on optimizing a Yolox_x model to run on the AGX Orin, particularly focused on reducing the latency with little cost to accuracy. Initially, I assumed that the DLA would help us “accelerate” our performance including latency. However, for all models I have tried, the latency has been increased by the DLA as compared to running purely on the GPU. Particularly for yolox_x , our latency with just GPUs is 4.8 ms and using the DLA gives us 23.9 ms (see the command below). While, I have read that the DLA is mostly focused on power efficiency, I wanted to make sure this is the case:
- For the Orin, what is the TOPs metric on the DLA as compared to the GPU?
- In what situations or ML networks does running with the DLA actually produce a lower latency over all?
- The command I am running looks like:
trtexec --onnx=model.onnx --saveEngine=model.engine --exportProfile=model.json --int8 --fp16 --useDLACore=0 --allowGPUFallback --useSpinWait --separateProfileRun > model.log
- Are there any other options or techniques I can use to lower the latency of this model when using the DLA?