Running the same network on both tensor cores and 2 DLAs when exported from ONNX

Does TRT automatically perform scheduling over multiple DLAs and tensor cores to optimize e2e latency?
Also, is DLA memory bandwidth shared with Tensor Cores?
Also how to enable DLAs from ONNX? Is this something that can be controlled from Python with some kind of annotations? (if TRT is unable to schedule say multiple convolutions over 2 DLAs and TensorCores)

What’s the best recommended approach to extracting maximum performance for a single model of combined DLA+TensorCores resources?



1. Sorry that the answer is no.
User need to decide which hardware to run with on their own.
The scheduler is only available for the tasks within GPU.

2. No. DLAs and PVAs have separate SRAM.

3. You need to convert the ONNX model into TensorRT first.
And set the device to DLA when executing the TensorRT engine.

// or 

And TensorRT python API doesn’t support DLA yet. This will be added in our future release.
So you will need to use TensorRT C++ interface to put the model into DLA.

4. If your GPU is free, just put the model into GPU which can give you the best performance.


Thank you so much for these answers!

Another question is can i somehow manually run some layers in the same network on DLA in parallel with some layers running on GPU? I see that there’s no per-layer DLA assignment, and setDLACore is a function of IRuntime, so it appears that I would have to build different engines and orchestrate this manually. But is this even possible/practical?

For shared bandwidth question, by bandwidth i meant LPDDR bandwidth - so if i’m running 2 models, both mostly BW-bound i will not see a speedup from using DLA in parallel, is that correct?

Is there a document where I can find more details about internal architecture of how DLAs are integrated with GPUs inside Xavier?

Thank you!!!


Sorry that we don’t support the layer-level hardware assignment.
You will need to put the entire model into the same processor.


Actually we have lots of document about DLA, even for the hardware architecture.
You can check this page to find more information: