Please check the output log of TensorRT.
If the model cannot run on DLA solely, the data transfer between GPU and DLA will cause an extra overhead and decrease the performance.
When you convert the model into TensorRT, it will show the layer placement of DLA and GPU.
The best case is to deploy a network that can run solely on the DLA so no extra data transfer overhead is required.