Abnormal values of nsight compute

namch0101 · June 23, 2025, 5:30am

Hello, I’m currently using nsight compute 2025.02 version. I recently learned that nisght compute is now supporting profiling MPS(Multi Process Service) process. However, when I profiled densenet121 forward computation with batch size 16 with mps percent of 40, I got abnormal value such as
L2 cache hit ratio: 119.261538
L2 cache hit ratio: 114.020619
L2 cache hit ratio: 139.68073
L2 cache hit ratio: 105.643496

which is not supposed to be over 100. I only activated a single process for profiling so profiling shouldn’t get any interference with other processes. what I’m curious about is that

I got abnormal values for 6 kernels out of 500 kernels. Can I at least trust other kernel’s profiled output?
How is this abnormal value coming up?

Thank you in advance!

namch0101 · June 23, 2025, 5:42am

Also, I’m curious if nsight compute is really “profiling kernel”. I profiled single mps process with mps percentage 40 and after that, I profiled another mps process with mps percentage 40 but with another mps process with mps perceentage 60 running on background.
I assumed that mps process without another process running to have more bandwidth throughput because it should utilize bandwidth more since there is no contention with other processes. However, process ‘with’ another process running has more bandwidth throughput according to nsight compute. It seems like nsight compute is profiling entire GPU environment during the kernel runtime, not just kernel itself. I wonder if my assumption is wrong.

Thank you in advance!

felix_dt · June 24, 2025, 6:06am

Metrics collected over multiple passes
can see some variance and out-of-range values if the passes aren’t fully deterministic. From the numbers, this looks to be the case here.

I think the documentation
is fairly clear on the profiling for MPS application:

Nsight Compute can be used to profile how the GPU is utilized while executing the work from all MPS clients concurrently. It does generally not support isolating the performance of individual clients.

You can make some adjustments via the observation window, but in general, the entire MPS server context is profiled, not each MPS client’s subcontext by itself.

Topic		Replies	Views
nsight-compute's profiling result is different from nvprof's Nsight Compute	5	642	October 12, 2021
Nvprof and Nsight returning different results for L1 and L2 cache hit rates Nsight Compute	4	666	August 13, 2019
Some metric values don't make sense Nsight Compute	0	541	June 9, 2019
Nsight Compute metrics value confused Nsight Compute performance-metrics	1	1119	December 14, 2021
Nsight and nvprof results have large differences Nsight Compute	9	1219	November 26, 2019
Nsight compute profile run with nan value in multi-process service(MPS) Nsight Compute kernel	10	1070	July 25, 2024
Missing kernels in NSight Profiling Nsight Visual Studio Edition	4	2045	October 2, 2015
Nvprof and Nsight returning different results for L1 and L2 cache hit rates Visual Profiler and nvprof	0	830	July 8, 2019
Kernel profiling missing Nsight Visual Studio Edition	10	4616	April 14, 2017
Compute CLI hangs when profiling PyTorch application Nsight Compute	8	1844	August 6, 2019

Abnormal values of nsight compute

Related topics