What does stall_not_selected and stall_memory_throttle mean in NVPROF?

Neo21c · October 13, 2014, 7:16pm

stall_memory_throttle: Percentage of stalls occurring because of memory throttle

stall_not_selected: Percentage of stalls occurring because warp was not selected

stall_not_selected and stall_memory are two of many available metrics in my CC 3.5 device. I am wondering what these counters exactly mean.

What exactly is memory throttle? I observe that it tends to be high in highly memory divergent & bandwidth intensive code but it sometimes have high value at relatively low dram bandwidth usage, too.

I have no idea about stall_not_selected. It seems to to have higher value when eligible_warps count is high. But to me, it doesn’t make sense.

Greg · October 15, 2014, 1:51am

The stall counters update every cycle by the number of active warps that are stalled by the specific reason.

A warp increments stall_not_selected if the warp is eligible to issue but the warp scheduler selected a different eligible warp. This is not a bad stall reason. If it is really high you may be able to reduce occupancy.

A warp increments stall_memory_throttle if the warp cannot issue because the LSU pipe is not available. On cc3.x devices a warp scheduler can only issue L1/SHM instructions every 4 cycles. If this reason is high then look to see if L1 accesses have high divergence or if SHM have high bank conflicts.

Neo21c · October 15, 2014, 5:01pm

Thank you so much. This was really helpful.

I got one more question. What is “stall_memory_dependency”? Description says “a memory operation cannot be performed due to the required resources not being available or fully utilized, or because too many request of given type are outstanding”.

The description sounds little different from the counter name (memory_dependency). If it is memory dependency, shouldn’t it count stalls from “memory load result being not yet available” ? Description sounds more like the counter counting “stalls from LD/ST unit busy or MSHR-like structure busy”. Which one is correct?

Thank you

Topic		Replies	Views
Description of stalls in nvprof Visual Profiler and nvprof	3	1096	April 5, 2019
Warp Schedulling CUDA Programming and Performance	7	8080	October 22, 2010
nvprof metrics: issue_slot_utilization and stall_other CUDA Programming and Performance	1	1030	December 13, 2018
Eligible/Stalled warps CUDA Programming and Performance	2	1436	June 8, 2020
Stall reasons summation is not 100% Nsight Compute	7	1119	October 12, 2021
nvprof metrics (Stalls) Visual Profiler and nvprof	0	1657	February 27, 2015
Stalll reasons CUDA Programming and Performance	1	632	May 2, 2020
Memory Workload Analysis related metrics Nsight Compute	1	1953	January 30, 2020
Warp switching does anybody understands the mechanism CUDA Programming and Performance	16	8639	March 28, 2008
Question regarding warp efficiency... CUDA Programming and Performance	9	15184	March 13, 2007

What does stall_not_selected and stall_memory_throttle mean in NVPROF?

Related topics