Different latency when using MPS

huan_77 · June 20, 2021, 6:08am

When I use MPS and trtexec to run 3 identical inference loads concurrently, 1 workload runs faster than the others obviously. When I run 5 identical inference loads concurrently, 2 workloads run faster than the others obviously. This situation does not happen in 2 and 4 processes.
What happened? How can I know which process will run faster before I run these workloads concurrently?

workload: resnet50
GPU: V100