Hello, I have a program which was written with Cuda fortran (PGI community edition 17.10), it turned out that when n in <<<x,n>>> is too large, the instructions in a global subroutine is not executed. The structure of the code is:
attributes(global) subroutine overl_kernel2(IA,JA,xpacd,ypacd,nx,ny,nt,nsize,abandd,x24d,w24d)
integer, value :: nx,ny,nt,nsize
real8, device,intent(IN) :: x24d(24),W24d(24)
real8,device,intent(IN) :: xpacd(0:nx),ypacd(0:ny)
real4,device :: abandd(nsize)
integer, intent(IN) :: IA(nsize),JA(nsize)
ka = (blockidx%x-1)*blockdim%x + threadidx%x
end subroutine overl_kernel2
It is called in another subroutine with:
It turned out that when “nthr” is too large, say, “nthr=1024”, the following instruction in subroutine “overl_kernel2”
is not executed at all, i.e., the function ovswp1 is not used. If, on the other hand, we change “nthr” to smaller values, say, “nthr=512”, the program works well.
I am wondering what can the cause of this problem be?
I am using PGI fortran community edition 17.10 with a Nvidia Quadro P4000 on a laptop with 32GB of memory with 64 bit Windows 10 Home edition.
thank you very much!