DLA and GPU running at the same time, performance degradation

Hi, all

I found performance degradation when running DLA and GPU cores at the same time.

When running vgg16 model on dla core alone, DLA0 was activated for 147.156ms per task.

However, when running vgg16 on dla core and alexnet on gpu core, DLA0 was activated for 177.967ms per task.

I want to know where this difference(about 30ms) comes from. I guessed that if memory bandwidh are saturated for processing two models simultaneously (alexnet, vgg16) , DLA0 could be delayed as you can see in the above example(147.156ms -> 177.967).

However, when I investigated EMC as an metrics for memory bandwidth utilization, EMC has a room even when processing both model.

So, I am confused why running DLA and GPU core at the same time caused performance degradation.

Thank you in advance.



Please noticed that when using DLA for inference, some TensorRT operation may fallback to GPU.
It’s recommended to check how many operation are using GPU first.
This can be found in the log from TensorRT.

Please noticed that these fallback operations will share the same GPU resource with alexnet inference.