CudaMemGetInfo problem

Dear,
I am learning the function “CudaMemGetInto” and I have a
strange problem. I call function CudaMemGetInto to check the
free and total memory. The return free and total memory are as follows:


The free spaces = 4947013632 Bytes
The total spaces = 5032968192 Bytes

Before I copy the data from CPU to GPU, I call the CudaMemGetInfo
again and found that the return free and total memory become 0.


The free spaces = 0 Bytes
The total spaces = 0 Bytes

I can not understand this problem. Is there somebody has the similar
problem?? i.e. The GPU architecture is Kepler K20
Neo

Hi Neo,

How are you calling cudaMemGetInfo and what are the data types of the arguments? Make sure you’re passing 64-bit integers, (integer(kind=8)) when compiling in 64-bits.

  • Mat

Hi Mat,
Thank you for your suggestion. My previous problem is solved. However, I get a new strange problem. I want to copy one array from host to device. The
free space firstly.


Current space

The free spaces = 2147414016 Bytes
The total spaces = 5032968192 Bytes

I use the following command to copy the 1d array CORRE in host
to D_CRE in device.

REAL(KIND=8) , ALLOCATABLE               :: CORRE(:)
REAL(KIND=8) , ALLOCATABLE , DEVICE :: D_CRE(:)

ALLOCATE( CORRE(NSIZE2) )

ISTAT = CUDAMALLOC( D_CRE, NSIZE2 )

ISTAT = CUDAMEMCPY( D_CRE , CORRE , NSIZE2 , CUDAMEMCPYHOSTTODEVICE )

NSIZE2 = 968422.
968422 * 8byte = 7743376 Byte << 2147414016 Bytes

The D_CRE will only use 7743376 Byte memory space. It , however, the
complier show the following error message.

0: copyin Memcpy (dev=0xa6c80000, host=0x5c0735f0, size=7747376) FAILED: 4(unspecified launch failure)

I can not fully understand where the error is ? Can you have some suggestion that I can try to debug. Thank you very much.

Hi SCCS,

While the error could be with the memcpy, the unspecified launch error could also be from a failed kernel. Since kernels are launched asynchronously, if they fail and no error handling is used, the failure shows up in the next device call. Granted, this error is occurring when copying to the device and the kernel failures typically fail when copying back. Do you launch any kernels before this copy and if so, can you add error checking by calling cudaGetLastError after the launch?

What’s the ISTAT value from CUDA Malloc?

What happens if you don’t use the CUDA calls directly:

REAL(KIND=8) , ALLOCATABLE               :: CORRE(:) 
REAL(KIND=8) , ALLOCATABLE , DEVICE :: D_CRE(:) 

ALLOCATE( CORRE(NSIZE2), D_CRE(NSIZE2) ) 
D_CRE=CORRE
  • Mat