Hi PGI engineers,
I have my big, long subroutine, the latest and the greatest is that I managed to make it work, it compiles with no problems.
when I decided to implement some error checking just to make sure everything runs great, I have the following problem:
istat = cudaGetLastError() returns a non-zero value of 11!
I did the investigation and digging into the problem and found out inside NVIDIA website that istat = 11 is happening because (This indicates that one or more of the parameters passed to the API call is not within an acceptable range of values)
I also checked further to see what could be the cause, could it be because that I am passing a pointers to matrices (total of 6 matrices) of 9900 elements each??
here is the call kernel arguments:
call GetReynVar_kernel<<<grid,threads>>>(nx,ny,ndx,ndy,pDev, & hnewDev,himinDev,himaxDev,cohimxDev,s,l,kd,zdatLowDev,qndatLowDev, &zdatMidDev,qndatMidDev,zdatHighDev,qndatHighDev,qniDev)
istat = cudaGetLastError()
if (istat .ne. 0) print*, cudaGetErrorString(istat)
the kernel subroutine:
!===============================================
attributes (global) subroutine GetReynVar_kernel (nx,ny,ndx,ndy, &
p,hnew,himin,himax,cohimx,s,l,kd,zdatLow,qndatLow, &
zdatMid,qndatMid,zdatHigh,qndatHigh,qni)
!===============================================
implicit none
integer :: i, j, k
integer, value :: nx,ny,ndx,ndy,s,l,kd
real(8) :: qni(nx,ny)
real(8) :: zdatLow(s), qndatLow(s) <=== s = 9900
real(8) :: zdatMid(l), qndatMid(l) <=== l = 9900
real(8) :: zdatHigh(kd), qndatHigh(kd) <=== kd = 9900
real(8) :: p(nx,ny)
real(8) :: hnew(ndx,ndy),himin(ndx,ndy),himax(ndx,ndy),cohimx(ndx,ndy)
real(8) :: AIJDev, ALZDev, C11Dev, C12Dev, C22Dev, DELDev, &
EVDev, F1Dev, F2Dev, FODev, H2BARDev, H3BARDev, HBARDev
real(8) :: VDev, V1Dev,V2Dev,V3Dev, V4Dev, V5Dev, XIDev, z0Dev
integer, dimension (400,400) :: locDev
real(8), dimension (400,400) :: QNDev, qn1Dev, qn2Dev,Q2Dev,PNDev, &
qn11Dev, qn22Dev,himDev, pimDev, sliderplaneheightDev,ZDev,z1Dev,z2Dev, &
R1Dev, S1Dev, U1Dev
real(8), dimension(20) :: ADev, BDev
i = (blockidx%x - 1) * blockDim%x + threadidx%x
j = (blockidx%y - 1) * blockDim%y + threadidx%y
hbarDev = 1.d0
h2barDev = 1.d0
h3barDev = 1.d0
if(((i >=2 ) .AND. (i <= nx)) .AND. ((j >=2) .AND. (j <= ny-1))) then
locDev(i,j) = floor((zDev(i,j) - zBaseLowDev) / zStepLowDev) + 1;
z1Dev(i,j)=zdatLow(locDev(i,j));
endif
end subroutine GetReynVar_kernel
please help!
Dolf