I’m considering to use cudaMallocPitch() or not. So, could you please clarify:
The reason of using cudaMallocPitch is for data-coalescing. In CUDA C, data elements in a 2D array is referenced using 2 pointers. So, it requires the starting point of each row is 64-byte aligned.
- In CUDA Fortran, do we have this restriction?
- If YES, is it correct that as long as each column is 64-byte aligned, we don’t need to use cudaMallocPitch(), and allocate() is good enough?
So, (in CUDA Fortran) if my data is a 2D array, with each element is 8-byte (e.g. double precision) and the row dim is a multiple of 8, then there is no need to use cudaMallocPitch()? Is that rite?
In case the row dim is not a multiple of 8, should I use cudaMallocPitch()? or is there another option in Fortran can be use for CUDA Fortran?
Thanks,
Tuan