using cudaMallocPitch

I’m considering to use cudaMallocPitch() or not. So, could you please clarify:

The reason of using cudaMallocPitch is for data-coalescing. In CUDA C, data elements in a 2D array is referenced using 2 pointers. So, it requires the starting point of each row is 64-byte aligned.

  1. In CUDA Fortran, do we have this restriction?
  2. If YES, is it correct that as long as each column is 64-byte aligned, we don’t need to use cudaMallocPitch(), and allocate() is good enough?

    So, (in CUDA Fortran) if my data is a 2D array, with each element is 8-byte (e.g. double precision) and the row dim is a multiple of 8, then there is no need to use cudaMallocPitch()? Is that rite?

In case the row dim is not a multiple of 8, should I use cudaMallocPitch()? or is there another option in Fortran can be use for CUDA Fortran?



  1. No, we don’t have that restriction
  2. No, you don’t need cudaMallocPitch()

Fortran programmers have been manually padding arrays for as long as there have been arrays, so you can do that yourself if you want. It all depends on how you access the data within the array, to determine whether you get good coalescing memory behavior or not.