global memory causes branching

Hi,

I have a for-loop which is causing branching:
[indent]my_kernel<<<…>>>(my_struct *info)
{
int n = info->loop_count[blockIdx.x]

for (i=0; i<n; i++)

}[/indent]
The argument “info” passed into the kernel is simply a structure that stores the addresses of some arrays stored in global memory on the GPU. When the loop count is read from an array in global memory, branching occurs.

Does anyone know how to prevent the branching?

cheers
steve

I am sure it doesn’t cause branching. The for loop will compile to include branching, but the global memory access certainly won’t. You can confirm this by looking at the ptx code nvcc produces. The only way to get rid of the branching from the loop would be to unroll it, which you can’t do if you want the iteration count to be determined at runtime.

I am sure it doesn’t cause branching. The for loop will compile to include branching, but the global memory access certainly won’t. You can confirm this by looking at the ptx code nvcc produces. The only way to get rid of the branching from the loop would be to unroll it, which you can’t do if you want the iteration count to be determined at runtime.

I thought that a for-loop would not cause branching if the CUDA compiler could determine that the loop count was the same for all threads in the block:
[indent]n = info->loop_count[blockIdx.x][/indent]
Is that incorrect?

I thought that a for-loop would not cause branching if the CUDA compiler could determine that the loop count was the same for all threads in the block:
[indent]n = info->loop_count[blockIdx.x][/indent]
Is that incorrect?

Yes it is incorrect. And it makes me wonder whether you are really asking about branch divergence and not branching. To the best of my (admittedly very limited) knowledge, it is impossible to construct a variable trip loop without some sort of conditional branching instructions. Branch divergence occurs when not every thread in the active warp follows the same code path as a result of a branch evaluating differently for different threads. The two things are not the same.

Yes it is incorrect. And it makes me wonder whether you are really asking about branch divergence and not branching. To the best of my (admittedly very limited) knowledge, it is impossible to construct a variable trip loop without some sort of conditional branching instructions. Branch divergence occurs when not every thread in the active warp follows the same code path as a result of a branch evaluating differently for different threads. The two things are not the same.