in the 87 page of Cuda Programming guide version 2.3.1
there are following explanation.
" center : misaligned float memory access reulting in one transaction." for the Cuda 1.2 or higher
but in my case the following code gives the results that says misaligned float memory access resulting in 16 transaction.
global void offsetcopy( float odata, float idata, int offset)
int xid=blockIdx.x*blockDim.x +threadIdx.x +offset;
for offset=1,2,…15 , 8x performance degradation arise.
this means 16 transactions are issued per half warp.
what happens for this misaligned memory access ?
is programming guide wrong?