DEALLOCATE: an illegal memory access was encountered

Hello,

I am new to CUDA Fortran; for now I am trying to do very basic things and I have some problems with this small piece of code :

$ cat main_simple2.F90 

PROGRAM MAIN_SIMPLE2
IMPLICIT NONE
INTEGER(KIND=4) , DEVICE :: KLEV_D

INTERFACE
ATTRIBUTES(GLOBAL) &
SUBROUTINE GPU_SIMPLE2 (KLEV)
INTEGER(KIND=4),INTENT(IN) :: KLEV
END SUBROUTINE GPU_SIMPLE2
END INTERFACE

KLEV_D = 15

CALL GPU_SIMPLE2 <<<600, 32>>> (KLEV_D)

END PROGRAM MAIN_SIMPLE2

ATTRIBUTES(GLOBAL) &
SUBROUTINE GPU_SIMPLE2 ( KLEV )

IMPLICIT NONE

INTEGER(KIND=4),INTENT(IN)    :: KLEV 
 
INTEGER(KIND=4) :: JLEV

REAL(KIND=8) :: ZTPRTH(KLEV),ZTDIFF(KLEV), ZTDISS(KLEV)

PRINT *, THREADIDX%X, BLOCKIDX%X, BLOCKDIM%X, KLEV

DO JLEV = 1, KLEV
  ZTDIFF(JLEV)  =0.0_8
ENDDO
DO JLEV = 1, KLEV
  ZTDISS(JLEV)  =0.0_8
ENDDO
DO JLEV = 1, KLEV
  ZTPRTH(JLEV)  =0.0_8
ENDDO

RETURN

END SUBROUTINE GPU_SIMPLE2


$ pgf90 -Mcuda=ptxinfo,fastmath main_simple2.F90 -o main_simple2.x  ; ./main_simple2.x
...
           30          422           32           15
           31          422           32           15
           32          422           32           15
0: DEALLOCATE: an illegal memory access was encountered

Sometimes I get the error, sometimes not. What is wrong with this ?

My NVIDIA card is :

$ pgaccelinfo

CUDA Driver Version:           8000
NVRM version:                  NVIDIA UNIX x86_64 Kernel Module  375.26  Thu Dec  8 18:36:43 PST 2016

Device Number:                 0
Device Name:                   Quadro M5000
Device Revision Number:        5.2
Global Memory Size:            8514830336
Number of Multiprocessors:     16
Concurrent Copy and Execution: Yes
Total Constant Memory:         65536
Total Shared Memory per Block: 49152
Registers per Block:           65536
Warp Size:                     32
Maximum Threads per Block:     1024
Maximum Block Dimensions:      1024, 1024, 64
Maximum Grid Dimensions:       2147483647 x 65535 x 65535
Maximum Memory Pitch:          2147483647B
Texture Alignment:             512B
Clock Rate:                    1038 MHz
Execution Timeout:             Yes
Integrated Device:             No
Can Map Host Memory:           Yes
Compute Mode:                  default
Concurrent Kernels:            Yes
ECC Enabled:                   No
Memory Clock Rate:             3305 MHz
Memory Bus Width:              256 bits
L2 Cache Size:                 2097152 bytes
Max Threads Per SMP:           2048
Async Engines:                 2
Unified Addressing:            Yes
Managed Memory:                Yes
PGI Compiler Option:           -ta=tesla:cc50

and my fortran compiler is :

$ pgf90 --version 

pgf90 17.7-0 64-bit target on x86-64 Linux -tp haswell 
PGI Compilers and Tools
Copyright (c) 2017, NVIDIA CORPORATION.  All rights reserved.

Thank you for your help.

Hi philou,

The problem here is that you’re using too much heap space. There’s a very limited amount of device allocatable heap space (~8MB). While there’s ways of increasing this by calling cudaDeviceSetLimit (max heap space is about 32MB), I would highly recommend you rewrite your code to not use automatics in your device code.

Automatics will implicitly allocate data from the device code which is a very slow operation and negatively impact your performance. It’s better to use fixed sized local arrays or pass in a global array.

-Mat

OK, thanks. But I have another question; the NVIDIA programmer manual states that (section 5.3.2) :

Local memory accesses only occur for some automatic variables as mentioned in Variable Memory Space Specifiers. Automatic variables that the compiler is likely to place in local memory are:

Arrays for which it cannot determine that they are indexed with constant quantities,
Large structures or arrays that would consume too much register space,
Any variable if the kernel uses more registers than available (this is also known as register spilling).

So in principle, the arrays I declared using KLEV as dimension should go to local memory, that is off-chip. Why am I hitting a 8Mb limit here ?

Regards,

Philippe