I am currently working with TensorRT with DLA on Jetson Orin Dev Kit. I have some question on which workflow to use. From what I have understood, two possible workflows exist to use DLA for inference :
Compilaton with TRT Builder, runtime with TRT Runtime
Compilaton with TRT Builder, runtime with cuDLA API
Here are my questions :
Is there another way to create DLA loaders without TensorRT builder?
cuDLA API exposes mechanisms to manage devices, memory and submit DLA tasks. In terms of performance benefits, is there a big difference between the strategy offered by TensorRT and the one we could create with cuDLA?
Same question for hybrid and standalone DLA inference.
Thank you very much for your help,
Environment
TensorRT Version: 8.4.1 GPU Type: Embedded Jetson Orin CUDA Version: 11.4