I am a graduate student doing a research project on edge server inference. I noticed an interesting thing:
When I run ResNet50 alone, the p50 latency is 3559 microsec, thruput is 280.95. When I run two instances of ResNet50, latency is 6048, thruput is 330.15. So far so good. I understand that deep down my GPU uses a Time Sliced Scheduler which makes the latency to nearly double, and maybe when there’s one instance it doesn’t fully utilize GPU so there’s a slight thruput increase.
When I run ResNet50 simultaneously with, say, VGG16. That’s different. latency of ResNet becomes 11226, thruput becomes 89.2, which is far from half of the thruput when running ResNet alone.
Why running ResNet with VGG interferes with the running of ResNet so much? Is it because VGG’s kernel functions on GPU is larger, and when time sliced scheduler lets them run roughly the same number of kernels, ResNet occupies less time on GPU?
Thank you very much!