I have a cuda fortran code within a large fortran code which uses significant cpu memory. When the array size increases, cuda malloc returns this error message:
1040 bytes requested; not enough memory 2(out of memory)
With a simpler driver with less cpu memory, this cuda fortran code can run with a much larger array size. So I am curious whether the memory on cpu side causes the problem. Any suggestion?
So I am curious whether the memory on cpu side causes the problem.
I would not think so. The two memories are separate. If you have the pinned attribute on the host array then that memory is managed by the CUDA driver, but that still shouldn’t cause an out of memory error on the device.
Is the larger host array the only change? Is the host array larger then 2GB? If so, are you compiling with “-Mlarge_arrays”?