I found performance degradation when running DLA and GPU cores at the same time.
When running vgg16 model on dla core alone, DLA0 was activated for 147.156ms per task.
However, when running vgg16 on dla core and alexnet on gpu core, DLA0 was activated for 177.967ms per task.
I want to know where this difference(about 30ms) comes from. I guessed that if memory bandwidh are saturated for processing two models simultaneously (alexnet, vgg16) , DLA0 could be delayed as you can see in the above example(147.156ms -> 177.967).
However, when I investigated EMC as an metrics for memory bandwidth utilization, EMC has a room even when processing both model.
So, I am confused why running DLA and GPU core at the same time caused performance degradation.
Thank you in advance.