Is this coalesced access global memory access in for loop and with divergent while loop

I have a kernel that access global memory in a for loop, and then followed with a data-dependent divergent while loop.

[codebox]global void my_kernel(int *g_data,int rows,int cols,int *g_results,…)

{

int i=0,j;

extern shared int s_data;

int word;

int res=0;

const int tid=blockDim.x*blockIdx.x+threadIdx.x;

/some code to assign the s_data/

__syncthreads();

for(j=0;j<rows;++j)

{

//__syncthreads();

word=g_data[j*cols+tid];

while(s_data[i]<word&&i<cols)

{

  ++i;

}

if(s_data[i]==word)

   res+=s_data[i];

//__syncthreads();

}

g_results[tid]=res;

}[/codebox]

Is this coalesed access? I have tried adding __syncthreads() in the for loop ,but the performance doesn’t improve.

I am a newbie to the CUDA programming, so any suggestion for my code is greatly appreciated.

Thanks

Yes, it is coalesced access… The one you are reading from g_data is colaesced…