Hello, I’m currently using nsight compute 2025.02 version. I recently learned that nisght compute is now supporting profiling MPS(Multi Process Service) process. However, when I profiled densenet121 forward computation with batch size 16 with mps percent of 40, I got abnormal value such as
L2 cache hit ratio: 119.261538
L2 cache hit ratio: 114.020619
L2 cache hit ratio: 139.68073
L2 cache hit ratio: 105.643496
which is not supposed to be over 100. I only activated a single process for profiling so profiling shouldn’t get any interference with other processes. what I’m curious about is that
- I got abnormal values for 6 kernels out of 500 kernels. Can I at least trust other kernel’s profiled output?
- How is this abnormal value coming up?
Thank you in advance!
Also, I’m curious if nsight compute is really “profiling kernel”. I profiled single mps process with mps percentage 40 and after that, I profiled another mps process with mps percentage 40 but with another mps process with mps perceentage 60 running on background.
I assumed that mps process without another process running to have more bandwidth throughput because it should utilize bandwidth more since there is no contention with other processes. However, process ‘with’ another process running has more bandwidth throughput according to nsight compute. It seems like nsight compute is profiling entire GPU environment during the kernel runtime, not just kernel itself. I wonder if my assumption is wrong.
Thank you in advance!
Metrics collected over multiple passes
can see some variance and out-of-range values if the passes aren’t fully deterministic. From the numbers, this looks to be the case here.
I think the documentation
is fairly clear on the profiling for MPS application:
Nsight Compute can be used to profile how the GPU is utilized while executing the work from all MPS clients concurrently. It does generally not support isolating the performance of individual clients.
You can make some adjustments via the observation window, but in general, the entire MPS server context is profiled, not each MPS client’s subcontext by itself.