Hello,
I am new to CUDA Fortran; for now I am trying to do very basic things and I have some problems with this small piece of code :
$ cat main_simple2.F90
PROGRAM MAIN_SIMPLE2
IMPLICIT NONE
INTEGER(KIND=4) , DEVICE :: KLEV_D
INTERFACE
ATTRIBUTES(GLOBAL) &
SUBROUTINE GPU_SIMPLE2 (KLEV)
INTEGER(KIND=4),INTENT(IN) :: KLEV
END SUBROUTINE GPU_SIMPLE2
END INTERFACE
KLEV_D = 15
CALL GPU_SIMPLE2 <<<600, 32>>> (KLEV_D)
END PROGRAM MAIN_SIMPLE2
ATTRIBUTES(GLOBAL) &
SUBROUTINE GPU_SIMPLE2 ( KLEV )
IMPLICIT NONE
INTEGER(KIND=4),INTENT(IN) :: KLEV
INTEGER(KIND=4) :: JLEV
REAL(KIND=8) :: ZTPRTH(KLEV),ZTDIFF(KLEV), ZTDISS(KLEV)
PRINT *, THREADIDX%X, BLOCKIDX%X, BLOCKDIM%X, KLEV
DO JLEV = 1, KLEV
ZTDIFF(JLEV) =0.0_8
ENDDO
DO JLEV = 1, KLEV
ZTDISS(JLEV) =0.0_8
ENDDO
DO JLEV = 1, KLEV
ZTPRTH(JLEV) =0.0_8
ENDDO
RETURN
END SUBROUTINE GPU_SIMPLE2
$ pgf90 -Mcuda=ptxinfo,fastmath main_simple2.F90 -o main_simple2.x ; ./main_simple2.x
...
30 422 32 15
31 422 32 15
32 422 32 15
0: DEALLOCATE: an illegal memory access was encountered
Sometimes I get the error, sometimes not. What is wrong with this ?
My NVIDIA card is :
$ pgaccelinfo
CUDA Driver Version: 8000
NVRM version: NVIDIA UNIX x86_64 Kernel Module 375.26 Thu Dec 8 18:36:43 PST 2016
Device Number: 0
Device Name: Quadro M5000
Device Revision Number: 5.2
Global Memory Size: 8514830336
Number of Multiprocessors: 16
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536
Total Shared Memory per Block: 49152
Registers per Block: 65536
Warp Size: 32
Maximum Threads per Block: 1024
Maximum Block Dimensions: 1024, 1024, 64
Maximum Grid Dimensions: 2147483647 x 65535 x 65535
Maximum Memory Pitch: 2147483647B
Texture Alignment: 512B
Clock Rate: 1038 MHz
Execution Timeout: Yes
Integrated Device: No
Can Map Host Memory: Yes
Compute Mode: default
Concurrent Kernels: Yes
ECC Enabled: No
Memory Clock Rate: 3305 MHz
Memory Bus Width: 256 bits
L2 Cache Size: 2097152 bytes
Max Threads Per SMP: 2048
Async Engines: 2
Unified Addressing: Yes
Managed Memory: Yes
PGI Compiler Option: -ta=tesla:cc50
and my fortran compiler is :
$ pgf90 --version
pgf90 17.7-0 64-bit target on x86-64 Linux -tp haswell
PGI Compilers and Tools
Copyright (c) 2017, NVIDIA CORPORATION. All rights reserved.
Thank you for your help.