Hi,
I have created 3 TensorRT model on Xavier NX, let’s call them M1,M2,M3.
M1 is generated for DLA 0 using --useDLACore=0
M2 is generated for DLA 1 using – =1
M3 is generated for GPU.
When I am running only M1 inference it took an average of 56 ms per image.
When I am running only M2 inference it took an average of 56 ms per image.
When I am running only M3 inference it took an average of 31 ms per image.
But When I am running M1 & M2 simultaneously, there was little impact on timing and they took 60 ms for each inference.
When I am running M1 & M3 simultaneously, there was also an impact on timing and they took 62 ms & 49 ms respectively for each inference.
When I am running M2 & M3 simultaneously, there was also an impact on timing and they took 62 ms & 49 ms respectively for each inference.
But the main thing was when I am running M1, M2, and M3 all simultaneously, there was a huge impact on M3 inference time, i.e, M1 took 66 ms, M2 took 66 ms & majorly impacted M3 took 77 to 80 ms.
As I am running M1 & M2 on DLA, if I run them simultaneously they both can have an impact, but when running all 3 models simultaneously M3 which is running altogether on the different device had too much impact.
Can you explain this behavior? I doing anything wrong? if so can you please correct me? It’s urgent for me to know this thing.