Throughput drops after saturation with more threads

LongY · July 23, 2014, 2:01pm

Theoretically, when the number of threads in SM increases until it reaches the peak throughput, the throughput is supposed to be saturated, meaning further increasing the threads, no acceleration gain and the throughput line should be flat.

Observe from this figure, the throughput goes up at first linearly and when it’s about to flat, it drops to a concave. This figure is from page 23 “Better Performance at Lower Occupancy” http://www.cs.berkeley.edu/~volkov/volkov10-GTC.pdf.

My question is why there is a concave in this figure instead of being flat.

P.S. the kernel is as follows

#pragma unroll UNROLL
    for( int i = 0; i < N_ITERATIONS; i++ ) 
    { 
    a = a * b + c; 
    }

I would be grateful for any comments.

little_jimmy · July 23, 2014, 2:17pm

the kernel hardly access global memory - mostly (only) local memory, if I am not mistaken

how many thread blocks are used?

The curve may equally be impacted by the change in thread blocks running concurrently, in turn impacted by local memory requirements and optimization of spilling, as the number of threads increase, I would think;

LongY · July 23, 2014, 2:37pm

a, b and c in the kernel are in registers. It runs only 1 block.
Yes. I think there might be some optimizations or something affect the curve.

LongY · July 23, 2014, 2:53pm

Thanks for your comment. jimmy.
a, b and c in the kernel are in registers. It runs only 1 block.
Yes. I think there might be some optimizations or something affect the curve.
I wonder if anyone had the same problem before, or any other additional answers which explain the concave are welcome.

little_jimmy · July 23, 2014, 3:04pm

the number of warps is increased too; and that may also impact eventual execution in the sense of what gets completed when; for one, it should impact scheduling by the schedulers