Are loop incrementations performed by all threads?

BeachHut · July 31, 2008, 10:56am

I may have missed this in the Manual (or in previous discussion here)… it seems like an important issue.

If I have a for loop on the device, eg:

global void kernel(){
int k;

for(k=0;k<5;k++){
}
}

I would imagine that this would cause a problem, as all of the threads within the block would try to increment k, or does the compiler do something clever when it sees a something like this?

Related to this is something else that I have often done in my code, but I would just like to confirm that it is necessary (and that it is not a silly way to do it):

If I have something in the kernel that doesn’t need to be done in parallel by all of the threads in the block, eg. if I am changing what some pointer points to, then I usually say:

if(threadIdx.x==0){
p1=p2;
}

Thanks

frea · July 31, 2008, 11:49am

You missed that in manual, all values which are not declared as shared constant etc are not shared between threads. So every thread would do exacly 5 iterations.

About the second question, probably doing that in all threads will be more effective as long as you don’t have an great amount of bank conflicts. First you will have no if’s there and second, the operations inside if are done for all threads, but some just discard the result ( in the example threads with threadIdx.x != 0 would throw away their results )

MisterAnderson42 · July 31, 2008, 12:05pm

Your variable int k is local to each thread in a register. There will be no race conditions as each thread will be incrementing a separate register. This is true of any variable declared normally in the kernel.

Your worries about race conditions only apply if you declare the variable shared as it will be the same shared variable across the entire block.

If I have something in the kernel that doesn't need to be done in parallel by all of the threads in the block, eg. if I am changing what some pointer points to, then I usually say:

if(threadIdx.x==0){

p1=p2;

}

That’s OK if p1 is a shared variable, but don’t forget the __syncthreads()!!! Otherwise some threads might continue past this point before thread 0 makes the assignment.

In simply situations like updating pointers, it is usually simpler and can be faster just to have every thread do it in their own register and avoid the __synchthreads(). That way, all threads are running independently and aren’t stalled waiting for the others on ever pointer update.

One situation where it is good to use the if (threadIdx.x == 0) construction is when you need to read a single value from global memory that the entire block will use. Having each thread perform that read will slow things down immensely.

BeachHut · July 31, 2008, 12:30pm

Ah, I see. Up until now I thought the register memory space was pretty similar to the shared memory… Woops!

So I suppose that, unless I am worried about running out of register memory space, then it is faster, like you said, to have copies of variables like k and the pointers p1 and p2 in my example above for every thread, rather than putting them in shared memory.

Many thanks

Topic		Replies	Views
Increment a device variable each time a kernel function gets executed. CUDA Programming and Performance	3	2127	April 6, 2012
shared memory problem usage in variables CUDA Programming and Performance	8	2550	September 22, 2010
Newbie Question: Threads What's going on here? CUDA Programming and Performance	5	2309	July 18, 2008
Thread memory concurrency within the same block? CUDA Programming and Performance	12	1598	September 29, 2010
need some help with this code CUDA Programming and Performance	1	2196	November 27, 2008
only-once-instruction in kernel CUDA Programming and Performance	3	2664	May 13, 2008
Kernel performance degradation CUDA Programming and Performance	3	2177	December 9, 2008
Shared Memory allocation.. CUDA Programming and Performance	5	5426	July 9, 2010
Local variables in kernel CUDA Programming and Performance	3	3574	September 5, 2009
Newbie question on shared variables CUDA Programming and Performance	5	4447	November 12, 2008

Are loop incrementations performed by all threads?

Related topics