Nvprof SM number usage and metrics profiling

zhiqi.0 · December 23, 2020, 3:21am

Hi,

I’m trying to profile a simple program by using nvprof. I have two questions:

How can I see a kernel launches on which SMs. E.g., I have matmul kernel and with block grid [7x2x1], which metrics I should use to know the SMs that thread blocks are mapped to (or how many SMs are in use for the execution of a kernel)?
I’m using multiple streams to check concurrent execution of kernels. I can observe concurrent execution with no-metric profiling i.e., nvprof ./program --stream 2 --times 1. But the concurrency failed to observe for metric profiling nvprof --metrics sm_efficiency ./program --stream 2 --times 1. So is there a way to also observe concurrecy during metrics profiling?

I use each CPU thread to maintain a single thread and keep launching small kernels (block grid [7x2x1] and each block has 1024 threads).

Some results for question 2:

nvprof with --metric profiling:
cmd: /usr/local/cuda/bin/nvprof --metrics sm_efficiency --concurrent-kernels on -f -o prof.nvvp ./program --stream 2 --times 1

nvprof without --metric profiling:
cmd: /usr/local/cuda/bin/nvprof --concurrent-kernels on -f -o prof.nvvp ./program --stream 2 --times 1

You could see concurrency is observed w/o profiling metrics but failed with profiling metrics.

System settings:
nvcc version 10.0
nvprof version 10.0.130
CUDA version 10.2

Topic		Replies	Views
Visual profiler dont display the true SM's utilization Visual Profiler and nvprof	1	680	April 25, 2021
Questions about NVVP Visual Profiler and nvprof	1	657	April 7, 2019
nsight-compute's profiling result is different from nvprof's Nsight Compute	5	611	October 12, 2021
Profiling simple shared memory transactions CUDA Programming and Performance	2	1602	September 6, 2015
How to understand the metrics results of profiling under multi-application? Visual Profiler and nvprof cuda , kernel	1	586	April 12, 2021
Profiling CUDA Programming and Performance	2	826	August 17, 2015
nvprof --metrics branch_efficiency..... Why no metrics ? Visual Profiler and nvprof	3	1696	December 14, 2019
What is the lowest level of GPU application that can be monitored on a GPU/GPUs? is it PID or threadId? Profiling Linux Targets cuda , nvml	4	276	July 5, 2024
Why NVPROF and Nsight not profiling one of the kernels? CUDA Programming and Performance	5	2281	October 26, 2015
Nvidia-smi -lms 1 and runtime CUDA Programming and Performance	10	2133	September 22, 2022

Nvprof SM number usage and metrics profiling

Related topics