Problem with CUDA fortran simple program

Hi, I am a beginneer of CUDA fortran and I am testing the following program. The code is compiled as pgf95 -ta=nvidia sumAB.cuf and it runs but gives me the wrong results. Any suggestion? Thanks,

!----------------module for sumAB--------------------------
module m_sumAB

use cudafor

contains

!-------------kernel subroutine-----------------
attributes(global) subroutine k_sumAB(n,A,B,C)

integer :: i
integer, value :: n

real, dimension (n) :: A,B,C

i=(blockidx%x-1)*blockdim%x+threadidx%x
if (i<=n) C(i)=A(i)+B(i)

end subroutine k_sumAB

!-------------host subrotuine--------------------
subroutine h_sumAB(n,bdim,A,B,C)
implicit none
integer :: n,bdim
real, dimension (n) :: A,B,C
real, device, dimension (n) :: Adev,Bdev,Cdev
Adev=A
Bdev=B
call k_sumAB<<<n/bdim, bdim>>>(n,Adev,Bdev,Cdev)
C=Cdev

end subroutine h_sumAB

end module m_sumAB
!---------------------------end module----------------------



program sumAB
!----------------------------------------------------
!
!purpose: sum two vector A and B of n-elements
!
!----------------------------------------------------
use m_sumAB

integer i
integer :: n=1000
integer :: bdim=100

real :: times,timef,sum
real, dimension (n) :: A,B,C,D
!-----------------end declaration variable-----------


!Initialzation arrays
A=1.2
B=2.2
C=0.
D=0.
E=0.

!CPU calculation
call cpu_time(times)
do i=1,n
D(i)=A(i)+B(i)
end do
call cpu_time(timef)

print *,‘CPU time required is: ‘,timef-times,’ seconds’


!GPU calculation
call cpu_time(times)
call h_sumAB(n,bdim,A,B,C)
call cpu_time(timef)
print *,‘GPU time required is: ‘,timef-times,’ seconds’


!diff between results
sum=0.
do i=1,n
sum=sum+C(i)-D(i)
end do

print *,'Difference between results is: ',sum,C(1),D(1)


pause

end program sumAB

Hi Jony,

I’m not sure. The program seems get correct answers when I run it.

% pgf95 sumAB.cuf  -o sumAB.out
% sumAB.out
 CPU time required is:    7.1525574E-06  seconds
 GPU time required is:    8.8712931E-02  seconds
 Difference between results is:     0.000000        3.400000
    3.400000
FORTRAN PAUSE: enter <return> or <ctrl>d to continue>

(Note that “-ta=nvidia” is for the Accelerator directive based model so has no effect on your code).

Can you please post more information including a sample of the output, which compiler version you’re using, and which GPU you have.

Thanks,
Mat

I Mat, thanks a lot for replying. I get the following answer:

% pgf95 sumAB.cuf  -o sumAB.out 
% sumAB.out 
 CPU time required is:    0.000000         seconds 
 GPU time required is:   0.2650000         seconds 
 Difference between results is:                NaN    -4.2451527E+37 
    3.400000 
FORTRAN PAUSE: continuing...

I have downloaded and installed the PGI Workstation complete package, release 10.2, 32 bit for Windows. I have Windows Xp and my processor is a Centrino dual core. About the GPU information, I run the “cufinfo” program provided by PGI and get the following answer:

Device Number: 0
Device Name: GeForce 9200M GE
Total Global Memory: 0.268 Gbytes
sharedMemPerBlock: 16384 bytes
regsPerBlock: 8192
warpSize: 32
maxThreadsPerBlock: 512
maxThreadsDim: 512 x 512 x 64
maxGridSize: 65535 x 65535 x 1
ClockRate: 1.300 GHz
Total Const Memory: 65536 bytes
Compute Capability Revision: 1.1
TextureAlignment: 256 bytes
deviceOverlap: F
multiProcessorCount: 1
integrated: F
canMapHostMemory: F

Hi Jony,

Try using the flag “-Mcuda=cc11” to tell the compiler that your device is compute capable 1.1. By default the compiler targets cc 1.3. If the works create a “$PGI/win32/10.x/bin/sitenvrc” file (replace ‘x’ with the actual release number) with the following line to make cc 1.1 the default.

set COMPUTECAP=1.1;

  • Mat

Thanks a lot Mat, that’s was the problem! Now it works fine :-)

Jony