I am a novice in OpenACC Fortran programming using PGI 17.4 Community Edition. Following Michael Wolfe slides “OpenACC for Fortran programmers”, I have a serial code and OpenACC code as follows:
The serial code:
program sequential_code
implicit none
integer, parameter :: dp = selected_real_kind(15,307)
real, dimension(:), allocatable :: a, b
real(dp) :: start_t, end_t
integer, parameter :: n = 1000000
call cpu_time(start_t)
call random_seed
allocate(a(n), b(n))
call random_number(a)
call process(a, b, n)
deallocate(a, b)
call cpu_time(end_t)
write(*,20) end_t-start_t
20 format('Total elapsed time is ', f10.5, ' seconds.')
contains
subroutine process( a, b, n )
real, intent(inout) :: a(n), b(n)
integer, intent(in) :: n
integer :: i
do i = 1, n
b(i) = exp(sin(a(i)))
enddo
end subroutine process
end program sequential_code
The OpenACC code:
program OpenACC_code
implicit none
integer, parameter :: dp = selected_real_kind(15,307)
real, dimension(:), allocatable :: a, b
real(dp) :: start_t, end_t
integer, parameter :: n = 1000000
call cpu_time(start_t)
call random_seed
allocate(a(n), b(n))
call random_number(a)
!$acc data copy(a,b)
call process(a, b, n)
!$acc end data
deallocate(a, b)
call cpu_time(end_t)
write(*,20) end_t-start_t
20 format('Total elapsed time is ', f10.5, ' seconds.')
contains
subroutine process( a, b, n )
real, intent(inout) :: a(n), b(n)
integer, intent(in) :: n
integer :: i
!$acc parallel loop
do i = 1, n
b(i) = exp(sin(a(i)))
enddo
end subroutine process
end program OpenACC_code
And bellow are command lines and output of the serial code and the OpenACC code:
The serial code:
pgf90 -o sequential_code.exe sequential_code.f90
./sequential_code.exe
Total elapsed time is 0.09600 seconds.
The OpenACC code:
export PGI_ACC_NOTIFY=1
pgf90 -acc -ta=tesla -o OpenACC_code.exe OpenACC_code.f90
./OpenACC_code.exe
launch CUDA kernel file=C:\Users\HP\Downloads\FORTRAN CODES\CUDA and OpenACC\Op
enACC\OpenACC_code.f90 function=process line=30 device=0 threadid=1 num_gangs=78
13 num_workers=1 vector_length=128 grid=7813 block=128
Total elapsed time is 0.13400 seconds.
My question is, what causes the OpenACC code slower than the serial code?
Thank you in advance.