stall_memory_throttle: Percentage of stalls occurring because of memory throttle
stall_not_selected: Percentage of stalls occurring because warp was not selected
stall_not_selected and stall_memory are two of many available metrics in my CC 3.5 device. I am wondering what these counters exactly mean.
What exactly is memory throttle? I observe that it tends to be high in highly memory divergent & bandwidth intensive code but it sometimes have high value at relatively low dram bandwidth usage, too.
I have no idea about stall_not_selected. It seems to to have higher value when eligible_warps count is high. But to me, it doesn’t make sense.
The stall counters update every cycle by the number of active warps that are stalled by the specific reason.
A warp increments stall_not_selected if the warp is eligible to issue but the warp scheduler selected a different eligible warp. This is not a bad stall reason. If it is really high you may be able to reduce occupancy.
A warp increments stall_memory_throttle if the warp cannot issue because the LSU pipe is not available. On cc3.x devices a warp scheduler can only issue L1/SHM instructions every 4 cycles. If this reason is high then look to see if L1 accesses have high divergence or if SHM have high bank conflicts.
Thank you so much. This was really helpful.
I got one more question. What is “stall_memory_dependency”? Description says “a memory operation cannot be performed due to the required resources not being available or fully utilized, or because too many request of given type are outstanding”.
The description sounds little different from the counter name (memory_dependency). If it is memory dependency, shouldn’t it count stalls from “memory load result being not yet available” ? Description sounds more like the counter counting “stalls from LD/ST unit busy or MSHR-like structure busy”. Which one is correct?