When I use MPS to run some processes concurrently, PCIe RX throughput rises quickly as the number of concurrent processes increases.
I use the vectorAdd in CUDA samples and changed a little code so that the load can run for enough time.
for (int i=0; i<=5000000; i++) vectorAdd<<<blocksPerGrid, threadsPerBlock>>>(d_A, d_B, d_C, numElements);
From 2 to 3 processes, the growth of PCIe is very strange. This also happens in other workloads.
I run these on T4.
So what happened, is this normal?