copyout Memcpy FAILED:4

Hi,

What is happening about that ?

copyout Memcpy (host=0xc0edd80, dev=0x100400, size=640000) FAILED:4

My configuration is:

FX4600

CUDA 3.0

CUDA Fortran 10.5

RedHat 5

I have reinstall the GPU driver and CUDA twice, but it is still fail …

On the other hand this program is workable in my another PC.

GT240

CUDA 3.0

CUDA Fortran 10.5

Fedora 12

Any one could help me?

Many Thanks~

My best guess is that you’ve run out of memory. How much total memory are you using on the GPU?

  • Mat

Hi Mat,

This is the output of pgaccelinfo:

CUDA Driver Version 3000

Device Number: 0
Device Name: Quadro FX 4600
Device Revision Number: 1.0
Global Memory Size: 804585472
Number of Multiprocessors: 12
Number of Cores: 96
Concurrent Copy and Execution: No
Total Constant Memory: 65536
Total Shared Memory per Block: 16384
Registers per Block: 8192
Warp Size: 32
Maximum Threads per Block: 512
Maximum Block Dimensions: 512, 512, 64
Maximum Grid Dimensions: 65535 x 65535 x 1
Maximum Memory Pitch: 2147483647B
Texture Alignment 256B
Clock Rate: 1200 MHz
Initialization time: 1638 microseconds
Current free memory 764674048
Upload time (4MB) 1680 microseconds (1383 ms pinned)
Download time 1651 microseconds (1335 ms pinned)
Upload bandwidth 2496 MB/sec (3032 MB/sec pinned)
Download bandwidth 2540 MB/sec (3141 MB/sec pinned)

The memory usage of the program is less than 100MB. So I think it is the cause of the fail.

I found that some people have the same problem before:

https://forums.developer.nvidia.com/t/cuda-fortran-supported-methods-for-data-transfer/131412/1

Although I have pointed the CUDALIB variable to the lib64 path of CUDA as you said before, it is still fail. The interesting thing is the accelerators works fine in my FX4600 machine.

Actually, what is the meaning of:

copyout Memcpy (host=0xc0edd80, dev=0x100400, size=640000) FAILED:4

Many Thanks!

sinsin

Hi sinsin,

The interesting thing is the accelerators works fine in my FX4600 machine.

I assume you meant that it fails on the FX4600 but succeeds on the GT240?

If the program fails when copying data back from the GPU, it’s usually an indication that the kernel abnormally exited for some reason. Try trapping the error by placing the following code just after your kernel launch:

istat = cudathreadsynchronize()
errCode = cudaGetLastError()
if (errCode .gt. 0) then
       print *, cudaGetErrorString(errCode)
       stop 'Error! Kernel failed!'
endif
  • Mat

Hi Mat,

Yes, it fails on the FX4600 but succeeds on the GT240.

Does it mean there are some softwares I do not installed in the FX4600 machine, which have been already installed in GT240 machine?

I have placed the following code after the kernel launch.

istat = cudathreadsynchronize() 
errCode = cudaGetLastError() 
if (errCode .gt. 0) then 
       print *, errCode
       stop 'Error! Kernel failed!' 
endif

The output is that:

Simulation Start !!!

***** CALLING SUBROUTINE *****

0.000000
Error! Kernel failed!

Any clue?

Thanks a lots!

sinsin

I confirm that there is something wrong with PGI 10.5 and 10.6. My code compile and run properly on PGI 10.3 but have the error

0: copyout Memcpy (host=0x1e50780, dev=0x20108e400, size=160000) FAILED: 4(unspecified launch failure)

on PGI 10.5 and 10.6. I use cuda3.2 in all cases.

I have the code but it links to different library (HDF5, VISIT, SZIP, SILO), so I’m not sure if it can be any of help to you guys by sending the code itself.

Thanks,
Tuan

Hi Tuan,

Unfortunately, this is a generic error so there isn’t a way to tell what’s wrong without the code.

10.6 is over a year old now so there is a possibility that the problem you are encountering has already been fixed. Please try the latest compiler and see if this solves the problem. If not, then please do send me the code. I have HDF5 but if you can send me pointers to the others, I would appreciate it.

  • Mat

Oh, I’m sorry. I means 11.5 and 11.6. It’s working fine for me with 11.3

Tuan

Hi,

Did you guys solve this one?

I still have an outstanding issue (reported in a separate thread) which looks very similar:

0: copyout Memcpy (host=0x2cd32290, dev=0x202320000, size=131072) FAILED: 4(unspecified launch failure)

I get this on the GTX480 and GTX580 but not on the C1060.

It doesn’t appear to be related to the memory available on the card though, as I’ve reduced all my grids right down. Interestingly, I only get the error if I compile with the -fast option - have you guys got optimization enabled?

I get the error with PGI11.6 and 11.7, but not with earlier versions of the compiler.

Rob.

Hi Rob,

Not yet. We just got the code last week and I’ve been on vacation for a few days. I’ll start looking at it today and let you know. Though, this is a generic error that may or may not be the same as what you are encountering.

  • Mat