Load data into shared memory in "if statement"

Hi, I am a new CUDA user, I have a question about loading data into shared memory in “if statement”

what I want to do is: if all the threads in the block get result=0, then it won’t load data into shared memory any more;
if there is any thread in the block get result=1, then it continues to load data into shared memory and do computation.

I have written codes in kernel like this, it is very slow. Is it correct?

    for(i=0; i<n; i++){
       if(result!=0){           //if all the threads get result==0, it won't continue????

             load data into shared memory;
             __syncthread();

            do computation;      //will the thread (result==0) do computation??? since all the threads are synchronized to load data into shared memory again.  
           
            if(result==0)
               break;
       }
    }

Thanks a lot!!

I suggest to seek some other way. consider atomics in shared memory or flags. Also may rethink whole problem.

Hi Lev, thanks a lot for your reply.

The problem itself has serious divergences. In a thread block, some threads will stop quickly if they get result ==0, some threads will continue to load data and computation if they get result == 1. Do you have any good suggestion to this kind of problems??

At first, I only use global memory, it is very slow. Since for my application, threads can’t load data from global memory in a coalesced way.

Now, I am trying to use shared memory to get better data access and data reuse.

What I hope is that if all the threads in a block get result==0, it will stop and return. But it seems that “__syncthread()” doesn’t allow the thread block stop, even all the threads in a block have gotten result==0.

like in the codes

for(i=0; i<n; i++){
if(result!=0){ //if all the threads get result==0, will it continue???

      load data into shared memory;
      __syncthread();

     do computation; 

     if(result==0)
        break;

}
}

See this thread for the solution to your problem: Block-wide voting using shared memory.