cudaMallocPitch compile error

I’m trying to use cudaMallocPitch():

r = cudaMallocPitch(udev, pitch, nx, ny*nz)

(where udev is an array of floats, and r, pitch, nx, ny, and nz are integers)

When I try to compile, I get the following error:

“PGF90-S-0155-Could not resolve generic procedure cudamallocpitch”

Apparently TheMatt got cudaMallocPitch() to work (see post from 2009: https://forums.developer.nvidia.com/t/cuda-fortran-and-cuda-api-3d-arrays/131782/1 ) and as far as I can tell, I’m using the function the same way he did. Any ideas?

I’m using pgfortran 10.2-0, should that be a problem?

Thanks!

Hi JDS7,

You’re using an very early version of CUDA Fortran which did not have the builtin interface for cudaMallocPitch. You will either need to update your compiler version or add your own explicit interface.

  • Mat

Okay. Thanks!

Hi Mat,

I’m getting the same error using pgfortran 11.2-1. Is that still not new enough? What version do I need?

Thanks

Hi JDS7,

11.2-1. Is that still not new enough?

No, this should be fine. My best guess either one of your arguments had the wrong data type or your return values isn’t an integer.

How is udev declared? Is “r” an integer?

  • Mat

That was the problem. It wasn’t happy about the fact that udev was not a 2-D array. It compiles when I use a 2-D array, even with pgfortran v. 10.2-0. Is there a way to allocate a 4-D array with padding to facilitate coalescing?

integer :: nx,ny,nz
integer :: r,pitch
real*4,allocatable,device :: udev(:,:,:,:)

The interfaces for all the cudaMallocPitch routines are like this, so they require a 2d array:

INTERFACE CUDAMALLOCPITCH
INTEGER FUNCTION CUDAMALLOCPITCHI1(devptr, pitch, width, height)
INTEGER*1, ALLOCATABLE, DEVICE, DIMENSION(:,:) :: devptr
INTEGER pitch, width, height
END FUNCTION

Except for this one:

INTEGER FUNCTION CUDAMALLOCPITCHCD(devptr, pitch, width, height)
IMPORT C_DEVPTR
TYPE(C_DEVPTR) :: devptr
INTEGER pitch, width, height
END FUNCTION

So, you can declare a variable of type(c_devptr), call cudaMallocPitch() with that variable and the appropriate pitch, width, and height, then “cast” the c_devptr into an allocatable device array of your choosing using our overloaded cuda fortran c_f_pointer function.

type(c_devptr) :: a_p
real, device, allocatable :: a_d(:,:,:,:)

call c_f_pointer(a_p, a_d, (/ 5, 10, 15, 20 /) )

pitch, width, and height have units of bytes when used with c_devptr, FYI.