for(int i=threadIdx.x; i<SHARED_MEM_CAPACITY; i += blockDim.x)
{
sharedArray[i] = INIT_VALUE;
}
__syncthreads();
Wont this initialize a shared array with less than 32 threads a block?
for(int i=threadIdx.x; i<SHARED_MEM_CAPACITY; i += blockDim.x)
{
sharedArray[i] = INIT_VALUE;
}
__syncthreads();
Wont this initialize a shared array with less than 32 threads a block?
I was replying to the initial post… but not to your post on re-convergence.
I was replying to the initial post… but not to your post on re-convergence.
Yes, but that still requires reconvergence before the __syncthreads() if SHARED_MEM_CAPACITY is not multiple of warp size.
Yes, but that still requires reconvergence before the __syncthreads() if SHARED_MEM_CAPACITY is not multiple of warp size.
I would be surprised if the code above does not work anymore. May be, their compiler has become too smart.
I would be surprised if the code above does not work anymore. May be, their compiler has become too smart.
Well we were suprised too :D
But the code you posted is probably safer as it is harder for the compiler to optimize + there is __syncthreads.