SINGLE THREADS

I would like to have just one thread execute a function while the rest wait for that thread. For this I have written the following expecting only thread 0 to execute function().

=================
i = threadIdx.x;

if(i==0) function();
__syncthreads();

=================

  1. is there always a thread 0?
    2 is the correct?
  2. if not, what should I write?

On another topic, I have a kernel which 400 threads are executing, and the kernel contains a loop over 200 iterations. I expect that if I put __syncthreads at the begining of the inner loop statement that the threads will be synchronised (I have only 1 block), and that if the inner loop statement consists of several functions then by writing __syncthreads after each function then all threads will wait unil each has executed the function. The code looks like

==========================

global void SPH()
{
i = threadIdx.x;

for(time=0 ; time<maxtime ; time++)
{
if(i==0) function1();
__syncthreads();

function2();
__syncthreads();

function3();
__syncthreads();

function4();
__syncthreads();

x[i] = function5();
}

}

==================

What will each thread do for this code structure, for thread 0 and thread >0?

In case you’re interested function1 calculates particle interactions, while the rest do some fluid dynamics calculations, and the particle positions are updated in function5.

HOwever, this does not work . It works on a NEX-SX8 in parallel, but when the code is modified for the C870 I get rubbish out, and I’m wondering if thread synchronisation is wrong.

}

  1. is there always a thread 0? yes
    2 is the correct? yes
  2. if not, what should I write?

What will each thread do for this code structure, for thread 0 and thread >0?

global void SPH()
{
i = threadIdx.x;

for(time=0 ; time<maxtime ; time++)
{ //all threads in the block sync on a new loop, go down after all threads in the block arrive;

if(i==0) function1(); //t0 executes function1(), other threads in its warp (i.e. t1-t31) follow it but not actually execute it
__syncthreads(); //all threads in the block sync, go down after all threads in the block arrive;

function2(); //all threads do this
__syncthreads(); //all threads in the block sync;

function3();
__syncthreads();

function4();
__syncthreads();

x[i] = function5();
}

}

this does not work .
– this is a debugging problem… you may comment out all but function1, then uncomment func2, etc… to locate the bug.