Hi, I am a new CUDA user, I have a question about loading data into shared memory in “if statement”
what I want to do is: if all the threads in the block get result=0, then it won’t load data into shared memory any more;
if there is any thread in the block get result=1, then it continues to load data into shared memory and do computation.
I have written codes in kernel like this, it is very slow. Is it correct?
for(i=0; i<n; i++){
if(result!=0){ //if all the threads get result==0, it won’t continue???
load data into shared memory;
__syncthread();
do computation; //will the thread (result==0) do computation??? since all the threads are synchronized to load data into shared memory again.
if(result==0)
break;
}
}
Thanks a lot!!