Different slowdowns when executing models concurrently

Hi all,

I am a graduate student doing a research project on edge server inference. I noticed an interesting thing:

When I run ResNet50 alone, the p50 latency is 3559 microsec, thruput is 280.95. When I run two instances of ResNet50, latency is 6048, thruput is 330.15. So far so good. I understand that deep down my GPU uses a Time Sliced Scheduler which makes the latency to nearly double, and maybe when there’s one instance it doesn’t fully utilize GPU so there’s a slight thruput increase.

When I run ResNet50 simultaneously with, say, VGG16. That’s different. latency of ResNet becomes 11226, thruput becomes 89.2, which is far from half of the thruput when running ResNet alone.

Why running ResNet with VGG interferes with the running of ResNet so much? Is it because VGG’s kernel functions on GPU is larger, and when time sliced scheduler lets them run roughly the same number of kernels, ResNet occupies less time on GPU?

I posted this question to the general forum and was recommended to post the question here.

Thank you very much!

Hi @milesyang,
Please allow us some time to check on this.

Hi AakankshaS,

sure, thanks a lot for your help!

Hi @AakankshaS , any luck about this problem please? I am writing a research article and any feedback would be greatly appreciated. Thanks!

Hi @milesyang,
Apologies for delays.
This depends on how the timing is calculated. The host code can be executed in parallel but the kernels are launched in serial.