Will this code write coalesced data ?

Pseudo-code is the following:

__device__ Kernel(float* pOut, int nStride)

{

 Â  Â for (int i = 0; i < MaxI, i++)

 Â  Â {

 Â  Â  Â  Â pOut[i * nStride + threadIdx.x] = DoSomething();

 Â  Â }

}

The idea is to write MaxI sets of outputs, each contains nStride integers. Will it fill pOut in coalesced manner ?

Yes, as long as nStride is a multiple of 16.