I’m seeing some inexplicable behaviour with allocatable arrays. I’m trying to create a small example that reproduces the problem, but haven’t managed to do so yet. The layout of the code is as follows:
module GPUMod use CUDAFOR implicit none logical :: ModLogVal1,ModLogVal2,... integer*4 :: ModIntVal1,ModIntVal2,... integer*4,allocatable :: ModIntAllArr1(:),ModIntAllArr2(:),... real*8,allocatable :: ModRealAllArr1(:),ModRealAllArr2(:),... real*8,device :: ModRealDevArr1(64),ModRealDevArr2(89),... integer*4,allocatable,device :: ModIntAllDevArr1(:),ModIntAllDevArr2(:),... real*8,allocatable,device :: ModRealAllDevArr1(:),ModRealAllDevArr2(:),... end module GPUMod (more subroutines) subroutine Init(size1,size2,size3) use GPUMod implicit none logical :: InitLogVal1,InitLogVal2,... character*8 :: InitCharArr(Size1) integer*4 :: size1,size2,size3,size4,size5,ProblemSize,InitIntVal1,InitIntVal2,... integer*4 :: InitIntArr1(Size2),InitIntArr2(Size2),... real*8 :: InitRealVal1,InitRealVal2,... real*8 :: InitRealArr1(size3),InitRealArr2(size3),... real*8,allocatable :: InitRealAllArr1(:) real*8,allocatable :: ProblemArr(:) allocate(ModIntAllArr1(size1)) ... allocate(ModIntAllArr2(size2)) ... allocate(ModRealAllArr1(size4)) allocate(ModRealAllArr2(size4)) ... allocate(ModIntAllDevArr1(size5)) allocate(ModIntAllDevArr2(size5)) ... ProblemSize = ? allocate(ProblemArr(ProblemSize)) ... allocate(ModRealAllDevArr1(size4)) allocate(ModRealAllDevArr2(size4)) ... ... ... deallocate(ProblemArr) End subroutine Init (more subroutines)
The result is incorrect for certain values of ProblemSize. If ProblemSize is smaller than a certain value (say 5000) or bigger than another value (say 100000), the answer is correct, but it is incorrect for values in between. If ProblemSize is set to 6000, it consistently gives the same wrong answer every time the program runs. If ProblemSize is set to 16000, it still consistently gives the same wrong answer every time the program runs, but gives a different wrong answer than when it is set to 6000. It is a runtime error, not a compile time one. No matter how I compile it, it compiles successfully and without any warnings or errors. I’ve tested this for versions 13.5 (cuda5.0 only) and 14.3 (cuda5.0 and 5.5). I’ve compiled with array bounds checking and there aren’t any problems. I’ve tested for memory leaks and there aren’t any. I’ve compiled with –mcmodel=medium, -i8, -Mlarge_arrays and all allowed combinations thereof and it behaves similarly. I’ve compiled with (-fast) and without (-O0) optimizations, though I’m compiling with the -Mcuda=cuda5.x,cc35 (x=0 or 5) flag, so the –O0 might be ignored? The source file is compiled and archived into a library which is used at link time with other libraries and objects to build the executable. If I change ProblemArr to not be allocatable, the program gives the correct result for any size of ProblemArr.
I’m really at a loss. I’ve never come across anything that behaves like this without breaking at compile time or segfaulting at runtime. This runs to completion, but the answers are incorrect when ProblemSize is within a certain range. Nothing is done with ProblemArr other than allocating it and deallocating it within the same subroutine. All the module arrays that are allocated within subroutine Init are deallocated in a subroutine which runs just before the program finishes. I’ve ensured that every array that is allocated is deallocated.
I’m sorry that I can’t provide more information. I’m continuing my efforts to create a reproducible example. In the meantime, any comments and/or suggestions are greatly welcome. Please feel free to ask me to elaborate on anything that might help.