GeForce RTX 3090 ti, 2 SMs per TPC, Total TPC = 41
I want execute a kernel per one process with MPS, namely, running multi-kernel concurrently.
Also, each kernel is using different TPC. Therefore, i think that single kernel execution(1 TPC) without MPS will make same execution time between 32-kernels(1 TPC per kernel) with 32-processes with MPS .
By continuously increasing the number of processes from one, we want to check whether the execution time changes according to the number of processes executed. However, the execution time increases unpredictably from the moment 14 to 15 processes are turned at the same time.
I don’t know the reason and have some questions.
- Is there a factor that causes the execution time to change depending on the SM (TPC) number?
- Is there any change depending on the bus for SM allocation?
Please let me know about this issue.
Thank you for your answering.