FORTRAN: memory management & auto allocation failed

Hi,

The code bellow crashes at run time on my device (GTX 1070) for NUM=210 in mdl.f90 (ok, for 29, should be 2**n). I need to do ffts (and some processing of them) for a large number of arrays.
I’m quite new to gpu programming. Is it related to some memory problems?
How this can be handled?

mdl.f90
pro.f90

! pgfortran -o pro -fast -ta=nvidia mdl.f90 pro.f90

PROGRAM PRO
  
  USE MDL
  
  REAL(RK),PARAMETER                :: FRE = 0.123456789_RK
  REAL(RK),DIMENSION(2*NUM)         :: ARR
  REAL(RK),DIMENSION(10000,2*NUM)   :: DATA
  REAL(RK),DIMENSION(10000)         :: OUTPUT
  INTEGER                           :: I
  
  ARR = 0.0_RK
  ARR(1:2*NUM:2) = SIN(2._RK*PI*FRE*REAL([(I,I=1,2*NUM,2)],RK))
    
  DO I=1,10000,1
    DATA(I,:) = ARR
  END DO
 
  !$ACC DATA COPYIN(DATA(:,:)) COPYOUT(OUTPUT(:))
  !$ACC PARALLEL LOOP
  DO I=1,10000,1
    CALL FFRFT(NUM,0.5_RK,DATA(I,:))
    OUTPUT(I) = SUM(DATA(I,:))
  END DO
  !$ACC END PARALLEL LOOP
  !$ACC END DATA
   
  WRITE(*,*) SUM(OUTPUT)

END PROGRAM PRO

! NUM = 2**9
! OK

! NUM = 2**10
! ...
! FATAL ERROR: FORTRAN AUTO ALLOCATION FAILED
! Failing in Thread:1
! call to cuStreamSynchronize returned error 719: Launch failed (often invalid pointer dereference)
! 
! Failing in Thread:1
! call to cuMemFreeHost returned error 719: Launch failed (often invalid pointer dereference)

Hi imorozov,

The program is overflowing the device heap, which by default this is only 8MB. To increase the heap, you can set the environment variable “PGI_ACC_CUDA_HEAPSIZE”. For example:

% ./pro
FATAL ERROR: FORTRAN AUTO ALLOCATION FAILED
FATAL ERROR: FORTRAN AUTO ALLOCATION FAILED
FATAL ERROR: FORTRAN AUTO ALLOCATION FAILED
FATAL ERROR: FORTRAN AUTO ALLOCATION FAILED
FATAL ERROR: FORTRAN AUTO ALLOCATION FAILED
FATAL ERROR: FORTRAN AUTO ALLOCATION FAILED
FATAL ERROR: FORTRAN AUTO ALLOCATION FAILED
FATAL ERROR: FORTRAN AUTO ALLOCATION FAILED
FATAL ERROR: FORTRAN AUTO ALLOCATION FAILED
FATAL ERROR: FORTRAN AUTO ALLOCATION FAILED
FATAL ERROR: FORTRAN AUTO ALLOCATION FAILED
FATAL ERROR: FORTRAN AUTO ALLOCATION FAILED
FATAL ERROR: FORTRAN AUTO ALLOCATION FAILED
FATAL ERROR: FORTRAN AUTO ALLOCATION FAILED
FATAL ERROR: FORTRAN AUTO ALLOCATION FAILED
FATAL ERROR: FORTRAN AUTO ALLOCATION FAILED
FATAL ERROR: FORTRAN AUTO ALLOCATION FAILED
FATAL ERROR: FORTRAN AUTO ALLOCATION FAILED
FATAL ERROR: FORTRAN AUTO ALLOCATION FAILED
FATAL ERROR: FORTRAN AUTO ALLOCATION FAILED
FATAL ERROR: FORTRAN AUTO ALLOCATION FAILED
FATAL ERROR: FORTRAN AUTO ALLOCATION FAILED
FATAL ERROR: FORTRAN AUTO ALLOCATION FAILED
FATAL ERROR: FORTRAN AUTO ALLOCATION FAILED
Failing in Thread:1
call to cuStreamSynchronize returned error 700: Illegal address during kernel execution

Failing in Thread:1
call to cuMemFreeHost returned error 700: Illegal address during kernel execution

% setenv PGI_ACC_CUDA_HEAPSIZE 67000000
% ./pro
    10868513.58065551

Note, I strongly recommend to users that they not use automatic arrays on the device. Besides the small heap space, device mallocs get serialized so can have a detrimental impact on performance.

Hope this helps,
Mat

Hi, Mat,

Thank you for your answer, this solves my problem.