Hi,
The code bellow crashes at run time on my device (GTX 1070) for NUM=210 in mdl.f90 (ok, for 29, should be 2**n). I need to do ffts (and some processing of them) for a large number of arrays.
I’m quite new to gpu programming. Is it related to some memory problems?
How this can be handled?
mdl.f90
pro.f90
! pgfortran -o pro -fast -ta=nvidia mdl.f90 pro.f90
PROGRAM PRO
USE MDL
REAL(RK),PARAMETER :: FRE = 0.123456789_RK
REAL(RK),DIMENSION(2*NUM) :: ARR
REAL(RK),DIMENSION(10000,2*NUM) :: DATA
REAL(RK),DIMENSION(10000) :: OUTPUT
INTEGER :: I
ARR = 0.0_RK
ARR(1:2*NUM:2) = SIN(2._RK*PI*FRE*REAL([(I,I=1,2*NUM,2)],RK))
DO I=1,10000,1
DATA(I,:) = ARR
END DO
!$ACC DATA COPYIN(DATA(:,:)) COPYOUT(OUTPUT(:))
!$ACC PARALLEL LOOP
DO I=1,10000,1
CALL FFRFT(NUM,0.5_RK,DATA(I,:))
OUTPUT(I) = SUM(DATA(I,:))
END DO
!$ACC END PARALLEL LOOP
!$ACC END DATA
WRITE(*,*) SUM(OUTPUT)
END PROGRAM PRO
! NUM = 2**9
! OK
! NUM = 2**10
! ...
! FATAL ERROR: FORTRAN AUTO ALLOCATION FAILED
! Failing in Thread:1
! call to cuStreamSynchronize returned error 719: Launch failed (often invalid pointer dereference)
!
! Failing in Thread:1
! call to cuMemFreeHost returned error 719: Launch failed (often invalid pointer dereference)
Hi imorozov,
The program is overflowing the device heap, which by default this is only 8MB. To increase the heap, you can set the environment variable “PGI_ACC_CUDA_HEAPSIZE”. For example:
% ./pro
FATAL ERROR: FORTRAN AUTO ALLOCATION FAILED
FATAL ERROR: FORTRAN AUTO ALLOCATION FAILED
FATAL ERROR: FORTRAN AUTO ALLOCATION FAILED
FATAL ERROR: FORTRAN AUTO ALLOCATION FAILED
FATAL ERROR: FORTRAN AUTO ALLOCATION FAILED
FATAL ERROR: FORTRAN AUTO ALLOCATION FAILED
FATAL ERROR: FORTRAN AUTO ALLOCATION FAILED
FATAL ERROR: FORTRAN AUTO ALLOCATION FAILED
FATAL ERROR: FORTRAN AUTO ALLOCATION FAILED
FATAL ERROR: FORTRAN AUTO ALLOCATION FAILED
FATAL ERROR: FORTRAN AUTO ALLOCATION FAILED
FATAL ERROR: FORTRAN AUTO ALLOCATION FAILED
FATAL ERROR: FORTRAN AUTO ALLOCATION FAILED
FATAL ERROR: FORTRAN AUTO ALLOCATION FAILED
FATAL ERROR: FORTRAN AUTO ALLOCATION FAILED
FATAL ERROR: FORTRAN AUTO ALLOCATION FAILED
FATAL ERROR: FORTRAN AUTO ALLOCATION FAILED
FATAL ERROR: FORTRAN AUTO ALLOCATION FAILED
FATAL ERROR: FORTRAN AUTO ALLOCATION FAILED
FATAL ERROR: FORTRAN AUTO ALLOCATION FAILED
FATAL ERROR: FORTRAN AUTO ALLOCATION FAILED
FATAL ERROR: FORTRAN AUTO ALLOCATION FAILED
FATAL ERROR: FORTRAN AUTO ALLOCATION FAILED
FATAL ERROR: FORTRAN AUTO ALLOCATION FAILED
Failing in Thread:1
call to cuStreamSynchronize returned error 700: Illegal address during kernel execution
Failing in Thread:1
call to cuMemFreeHost returned error 700: Illegal address during kernel execution
% setenv PGI_ACC_CUDA_HEAPSIZE 67000000
% ./pro
10868513.58065551
Note, I strongly recommend to users that they not use automatic arrays on the device. Besides the small heap space, device mallocs get serialized so can have a detrimental impact on performance.
Hope this helps,
Mat
Hi, Mat,
Thank you for your answer, this solves my problem.