Fatal error: fortran auto allocation failed


I’m kinda new to cuda-fortran and I’m having trouble with my code when certain variables are larger.
With small inputs, the function runs correctly, but when I use bigger inputs, the function crash with the following message:

0: copyout Memcpy (host=0xe80ca80, dev=0x2b59cccb1e00, size=728428) FAILED: 719(unspecified launch failure)

The error seems to occur when I transfer the variable from the device to the host (Y1=Y_d, line 142 of the file Test_Cuda_fct.cuf).
You can find the program file and the output message in the attachment.
The medium_input folder is not in it due to its large size (750mo).
I run my program with the following commands:

nvfortran -c MOD_deviceQuery.cuf
nvfortran Test_Cuda_fct.cuf MOD_deviceQuery.o -o Test_Cuda_fct.x
nvprof ./Test_Cuda_fct.x

Test_Cuda_fct.zip (112.2 KB)

Thank you in advance for your help,
Best regards,
Rémy Bretin

The error is probably occurring in the kernel itself, not this line. Since kernels are launched asynchronously, unless the code specifically checks the error status, errors from the kernel would be seen in the next GPU operation, which in this case is the copy

Also due to this, your timing will be meaningless since the CPU doesn’t block until it reaches the copy (after your timers), so I’d recommend adding a call to “cudaDeviceSyncronize” after the kernel calls

The actual error is most likely due to your automatics. Automatics implicitly allocate memory and the default device heap is quite small. You can increase this by calling cudadevicesetlimit using cudaLimitMallocHeapSize.

Though, device side allocation can be slow and adversely effect performance. So if you’re able, you should consider rewriting the algorithm to not use automatics.


Hi Mat,

Thank you again for your answer.
Right now, you are kinda speaking Chinese to me (or any other language that I wouldn’t understand) but I will look more into this and will come back when I will understand what you are speaking about.

Thank you,
Have a great day,