global memory causes branching

Steve_Jeffrey · September 23, 2010, 10:53am

Hi,

I have a for-loop which is causing branching:
[indent]my_kernel<<<…>>>(my_struct *info)
{
int n = info->loop_count[blockIdx.x]

for (i=0; i<n; i++)
…
}[/indent]
The argument “info” passed into the kernel is simply a structure that stores the addresses of some arrays stored in global memory on the GPU. When the loop count is read from an array in global memory, branching occurs.

Does anyone know how to prevent the branching?

cheers
steve

avidday · September 23, 2010, 11:05am

I am sure it doesn’t cause branching. The for loop will compile to include branching, but the global memory access certainly won’t. You can confirm this by looking at the ptx code nvcc produces. The only way to get rid of the branching from the loop would be to unroll it, which you can’t do if you want the iteration count to be determined at runtime.

avidday · September 23, 2010, 11:05am

I am sure it doesn’t cause branching. The for loop will compile to include branching, but the global memory access certainly won’t. You can confirm this by looking at the ptx code nvcc produces. The only way to get rid of the branching from the loop would be to unroll it, which you can’t do if you want the iteration count to be determined at runtime.

Steve_Jeffrey · September 23, 2010, 8:47pm

I thought that a for-loop would not cause branching if the CUDA compiler could determine that the loop count was the same for all threads in the block:
[indent]n = info->loop_count[blockIdx.x][/indent]
Is that incorrect?

Steve_Jeffrey · September 23, 2010, 8:47pm

I thought that a for-loop would not cause branching if the CUDA compiler could determine that the loop count was the same for all threads in the block:
[indent]n = info->loop_count[blockIdx.x][/indent]
Is that incorrect?

avidday · September 24, 2010, 7:53am

Yes it is incorrect. And it makes me wonder whether you are really asking about branch divergence and not branching. To the best of my (admittedly very limited) knowledge, it is impossible to construct a variable trip loop without some sort of conditional branching instructions. Branch divergence occurs when not every thread in the active warp follows the same code path as a result of a branch evaluating differently for different threads. The two things are not the same.

avidday · September 24, 2010, 7:53am

Yes it is incorrect. And it makes me wonder whether you are really asking about branch divergence and not branching. To the best of my (admittedly very limited) knowledge, it is impossible to construct a variable trip loop without some sort of conditional branching instructions. Branch divergence occurs when not every thread in the active warp follows the same code path as a result of a branch evaluating differently for different threads. The two things are not the same.