DLA and GPU running at the same time - performance question

Hi,

Sorry for the late update.

The threading code can be found in this file:
/usr/src/tensorrt/samples/common/sampleInference.cpp

int threadsNum = inference.threads ? inference.streams : 1;
int streamsPerThread  = inference.streams / threadsNum;

 std::vector<std::thread> threads;
 for (int t = 0; t < threadsNum; ++t)
 {
    threads.emplace_back(makeThread(inference, iEnv, sync, t, streamsPerThread, device, trace));
 }
for (auto& th : threads)
{
    th.join();
}

....

Thanks.

Hi @AastaLLL,
I’ve used the threading code above that you’ve supplied (its from TensorRT 7 right?)
I still don’t see any better performance, its like they interfere each other, especially with a network that not all layers can run on the DLA and at the beginning and end of the network, falls back to the GPU.

I see ~30% penalty in GPU and DLA performance versus when each of them runs alone.

thanks
Eyal

@AastaLLL,
Thanks for the assistance, I’m still not seeing the DLA and GPU running at the same time. Could it be that I must use TensorRT 7 and Jetpack 4.4 ? I’m currently using 4.3 and TensorRT 5.1.

And another question, please. If I recall correctly DLA operations do NOT appear in NVVP, right? so if I run my network on the DLA but still see ~40-50% of the timeline in NVVP showing (some?) GPU operations, this is the DLA falling back to the GPU?

thanks
Eyal

Hi,

You can find the detailed support matrix here:

Not all the TensorRT layers have an implementation in the DLA.
For the non-supported layer, DLA will fallback it into GPU and it will use the GPU resources.

So if your model have some fallback layers, it is expected that the performance will be lower if running a GPU pipeline at the same time.
Since Jetson has only one GPU, fallback layer and GPU engine will need to wait for the GPU resource in turns.

Thanks.