cuda malloc use CPU memory

Hi,

I have a cuda fortran code within a large fortran code which uses significant cpu memory. When the array size increases, cuda malloc returns this error message:

1040 bytes requested; not enough memory 2(out of memory)

With a simpler driver with less cpu memory, this cuda fortran code can run with a much larger array size. So I am curious whether the memory on cpu side causes the problem. Any suggestion?

Thanks,

sjz

So I am curious whether the memory on cpu side causes the problem.

I would not think so. The two memories are separate. If you have the pinned attribute on the host array then that memory is managed by the CUDA driver, but that still shouldn’t cause an out of memory error on the device.

Is the larger host array the only change? Is the host array larger then 2GB? If so, are you compiling with “-Mlarge_arrays”?

\

  • Mat

The original error message:

in so0: cudaMallocPitch: 1040 bytes requested; not enough memory 2(out of memory)


The relevant code:

cudaMallocPitch(dev_asytob, dev_pitch, m, np)

where

real, device, allocatable, dimension(:,:) :: dev_asytob
dev_pitch = 64
m = 130
np=43


I do not believe that array is larger than 2GB.

m is the number of columns to be processed in the driver as well as in
this gpu code.

Do you think that pitch causes problem?

Thanks,


Shujia

The problem solved. It is due to a bug in memory allocation. SJZ