i’m using Jetpack 4.3 and TensorRT 6.0.
I try to run three networks on the Xavier AGX. The largest is run on the GPU and the others on the DLA0 and DLA1. The inference is running in three threads, for each hardware one.
But it seems the GPU and DLAs run serially and not concurrently. (view image)
I used trtexec to generate the engines and the engines for the DLAs were build without GPUFallback. All Layers that are supposed to run on the DLAs are supported.
I used nvvp for profiling.
Thanks in Advance