yeah, sure, here is the program:
!
! Fortran Console Application
! Generated by PGI Visual Fortran(R)
! 12/10/2012 3:02:36 PM
!
program openacc1
!use openacc
implicit none
integer :: nx,ny,i,j,ak
integer, allocatable, dimension (:,:) :: A
integer :: start_time(8), end_time(8)
CHARACTER (LEN = 12) REAL_CLOCK (3)
CALL DATE_AND_TIME (REAL_CLOCK (1), REAL_CLOCK (2),&
REAL_CLOCK (3), start_time )
nx = 3000
ny = 3000
ak = 0
allocate (a(nx,ny))
A(1:nx,1:ny) = 2
!$acc kernels loop
do i = 1, nx
do j = 1, ny
ak = ak + A(i,j)
enddo
enddo
write(,) 'ak = ’ ,ak
write(,)
CALL DATE_AND_TIME (REAL_CLOCK (1), REAL_CLOCK (2),&
REAL_CLOCK (3), end_time )
write(,10) 'PROGRAM STARTED AT: ', START_TIME(5), START_TIME(6),&
START_TIME(7), START_TIME(8)
write(,15) 'PROGRAM ENDED AT: ', end_time(5), end_time(6), &
end_time(7),end_time(8)
continue
deallocate(a)
10 format(1X, A, I2.2, ‘:’, I2.2, ‘:’, I2.2, ‘:’, I3.3)
15 format(1X, A, I2.2, ‘:’, I2.2, ‘:’, I2.2, ‘:’, I3.3)
end program openacc1
Fortran->target accelerators->Target NVIDIA Accelerators = yes
Fortran->Command Line → -acc -Minfo=accel
the output after I compile: ( no “use openacc”)
------ Rebuild All started: Project: OpenACC1, Configuration: Release x64 ------
Deleting intermediate and output files for project ‘OpenACC1’, configuration ‘Release’
Compiling Project …
OpenACC1.f90
openacc1:
25, Generating copyin(a(1:3000,1:3000))
Generating compute capability 1.0 binary
Generating compute capability 2.0 binary
26, Loop is parallelizable
27, Loop is parallelizable
Accelerator kernel generated
26, !$acc loop gang, vector(8) ! blockidx%x threadidx%x
27, !$acc loop gang, vector(8) ! blockidx%y threadidx%y
CC 1.0 : 8 registers; 304 shared, 32 constant, 0 local memory bytes; 66% occupancy
CC 2.0 : 10 registers; 264 shared, 64 constant, 0 local memory bytes; 33% occupancy
28, Sum reduction generated for ak
Linking…
OpenACC1 build succeeded.
Build log was saved at “file://D:\Cuda Dev\OpenACC1\x64\Release\BuildLog.htm”
========== Rebuild All: 1 succeeded, 0 failed, 0 skipped ==========
still can use “use openacc”, don’t know why.
output when execute the code: → (using !$acc before do loops)
ak = 18000000
PROGRAM STARTED AT: 11:56:30:675
PROGRAM ENDED AT: 11:56:30:776
Press any key to continue . . .
time of execution = 101 msec
output when execute the code: → (without using !$acc before do loops)
ak = 18000000
PROGRAM STARTED AT: 11:58:14:167
PROGRAM ENDED AT: 11:58:14:179
Press any key to continue . . .
time of execution = 12 msec !!
so, this means time using parallel acc loops is longer than using cpu to do loops!
does that make sense to you?? I think I am doing something wrong here.
Dolf