You can only read float4 if the source address is aligned to sizeof(float4) = 16. For pointers returned by cudaMalloc* this results in the observations you made, i.e you can read 0,1,2,3 ; 4,5,6,7 ; 8,9,10,11 ; … but not 5,6,7,8
1 Like