Does TRT automatically perform scheduling over multiple DLAs and tensor cores to optimize e2e latency?
Also, is DLA memory bandwidth shared with Tensor Cores?
Also how to enable DLAs from ONNX? Is this something that can be controlled from Python with some kind of annotations? (if TRT is unable to schedule say multiple convolutions over 2 DLAs and TensorCores)
What’s the best recommended approach to extracting maximum performance for a single model of combined DLA+TensorCores resources?