Reasons for encountering stalls of type "misc"

david.spitzenberg · January 8, 2020, 9:40am

Hello everybody!

I’m currently profiling an OptiX 7.0 application using Nsight Compute 2019.5. The application runs on a GeForce RTX 2080 Ti board.

I encounter a whole load of stalls. According to the “details” page, section “warp state statistics”, roughly a third of the “warp cycles per issued instruction” is caused by stalls of type “misc” on average. Unfortunately, the description of such stalls is rather vague (“[…] warp […] being stalled on a miscellaneous hardware reason.”). Is there a more detailed listing of reasons for encountering such stalls similar to [1]? [1] lists reasons for stalls of type “other” (which I think is equivalent to “misc”) for compute capabilities up to 6.*, but above board implements compute capability 7.5.

Thanks for your help!
David

[1] cuda - What are "Other" Issue Stall Reasons displayed by the Nsight profiler? - Stack Overflow

mstrengert · January 9, 2020, 5:23pm

Hi David,

On Volta and Turing the misc stall reasons covers waiting on hardware resources that can only be accessed from library code, e.g. through Optix, or when profiling debug code. When executing a kernel with library code mixed with your own device code, the kernel-level metrics on the Details page show the aggregated values for whole kernel execution. That makes it more challenging to determine which parts are caused by the library and which parts are under your own control.

To aid with the performance analysis in those scenarios, the Source View can be used to isolate your own code and get the stall reasons per user function. For that to work, assure you compile your device code with -lineinfo, capture a report with Nsight Compute with the Source Counters section enabled, and switch to the Source Page. The Sampling Data column shows the stall reasons per source line for your own code. You can quickly jump to the lines with the highest number of stalls using the navigation buttons on the top. If you have the latest version of Nsight Compute and your source code has multiple device functions, you can also use the [-] button to the left of the search field to aggregate the metrics per function. In many cases this can help understanding if there is further potential to reduce stalls in parts of the kernel you have direct source control over.

david.spitzenberg · January 20, 2020, 10:21am

Hi mstrengert,

I wasn’t aware of the fact that the Details page shows aggregated values for both my own device code and library code. This also explains to me, why the figures reported in the Details page and the Source page deviate from one another. Thanks a lot for pointing this out to me, this really helped me!

Topic		Replies	Views
nvprof metrics (Stalls) Visual Profiler and nvprof	0	1647	February 27, 2015
nvprof metrics: issue_slot_utilization and stall_other CUDA Programming and Performance	1	1015	December 13, 2018
Description of stalls in nvprof Visual Profiler and nvprof	3	1070	April 5, 2019
What does stall_not_selected and stall_memory_throttle mean in NVPROF? CUDA Programming and Performance	2	3437	October 15, 2014
Stalll reasons CUDA Programming and Performance	1	615	May 2, 2020
Stall reasons summation is not 100% Nsight Compute	7	1078	October 12, 2021
Cuda stalls inexplicably CUDA Programming and Performance	0	1037	November 25, 2009
Eligible/Stalled warps CUDA Programming and Performance	2	1394	June 8, 2020
nvprof supported metrics Visual Profiler and nvprof	1	1888	January 6, 2015
Strange cudaLaunch stall in NV Visual Profiler Nsight Eclipse Edition	1	2078	November 29, 2012

Reasons for encountering stalls of type "misc"

Related topics