order of execution in a divergent warp

for(int i=threadIdx.x; i<SHARED_MEM_CAPACITY; i += blockDim.x)

{

  sharedArray[i] = INIT_VALUE;

}

__syncthreads();

Wont this initialize a shared array with less than 32 threads a block?

I was replying to the initial post… but not to your post on re-convergence.

I was replying to the initial post… but not to your post on re-convergence.

Yes, but that still requires reconvergence before the __syncthreads() if SHARED_MEM_CAPACITY is not multiple of warp size.

Yes, but that still requires reconvergence before the __syncthreads() if SHARED_MEM_CAPACITY is not multiple of warp size.

I would be surprised if the code above does not work anymore. May be, their compiler has become too smart.

I would be surprised if the code above does not work anymore. May be, their compiler has become too smart.

Well we were suprised too :D
But the code you posted is probably safer as it is harder for the compiler to optimize + there is __syncthreads.