Hi,
I am looking for a feature in CUDA Fortran similar to CUDA C for aligning data structures so as to comply with GPU global memory coalescing. As an example quoted in CUDA C programming guide, we have,
struct align(16)
{ float x;
float y;
float z;
};
As per documentation, any access to data residing in global memory compiles to a single global memory instruction if and only if the size of the data type is 1, 2, 4, 8, or 16 bytes and the data is naturally aligned (i.e. its address is a multiple of that size).
I have a 3d data structure similar to above. How do I align them in CUDa Fortran to satisfy the above criteria? Does the align attribute of “allocate” API serve this purpose?