Poor perfomance of OpenACC code comparing to serial code

Truong_Dang · November 5, 2017, 3:56am

I am a novice in OpenACC Fortran programming using PGI 17.4 Community Edition. Following Michael Wolfe slides “OpenACC for Fortran programmers”, I have a serial code and OpenACC code as follows:

The serial code:

program sequential_code
  implicit none
  integer, parameter              :: dp = selected_real_kind(15,307)
  real, dimension(:), allocatable :: a, b
  real(dp)                        :: start_t, end_t
  integer, parameter              :: n = 1000000


  call cpu_time(start_t)
  call random_seed
  allocate(a(n), b(n))
  call random_number(a)
  call process(a, b, n)
  deallocate(a, b)
  call cpu_time(end_t)

  write(*,20) end_t-start_t
  20 format('Total elapsed time is ', f10.5, ' seconds.')

  contains
  subroutine process( a, b, n )
    real, intent(inout)    :: a(n), b(n)
    integer, intent(in)    :: n
    integer                :: i

    do i = 1, n
        b(i) = exp(sin(a(i)))
    enddo
  end subroutine process
end program sequential_code

The OpenACC code:

 program OpenACC_code
  implicit none
  integer, parameter              :: dp = selected_real_kind(15,307)
  real, dimension(:), allocatable :: a, b
  real(dp)                        :: start_t, end_t
  integer, parameter              :: n = 1000000


  call cpu_time(start_t)
  call random_seed
  allocate(a(n), b(n))
  call random_number(a)
  
  !$acc data copy(a,b)
  call process(a, b, n)
  !$acc end data
  
  deallocate(a, b)
  call cpu_time(end_t)

  write(*,20) end_t-start_t
  20 format('Total elapsed time is ', f10.5, ' seconds.')

  contains
  subroutine process( a, b, n )
    real, intent(inout)    :: a(n), b(n)
    integer, intent(in)    :: n
    integer                :: i

    !$acc parallel loop
	do i = 1, n
        b(i) = exp(sin(a(i)))
    enddo
	
  end subroutine process
end program OpenACC_code

And bellow are command lines and output of the serial code and the OpenACC code:

The serial code:
pgf90 -o sequential_code.exe sequential_code.f90
./sequential_code.exe
Total elapsed time is 0.09600 seconds.

The OpenACC code:
export PGI_ACC_NOTIFY=1
pgf90 -acc -ta=tesla -o OpenACC_code.exe OpenACC_code.f90
./OpenACC_code.exe
launch CUDA kernel file=C:\Users\HP\Downloads\FORTRAN CODES\CUDA and OpenACC\Op
enACC\OpenACC_code.f90 function=process line=30 device=0 threadid=1 num_gangs=78
13 num_workers=1 vector_length=128 grid=7813 block=128
Total elapsed time is 0.13400 seconds.

My question is, what causes the OpenACC code slower than the serial code?

Thank you in advance.

Truong_Dang · November 5, 2017, 9:36am

I added a do loop in the code block between the two calls of cpu_time, which ran the program 100 times. The OpenACC code ran eight times faster than the serial code. I am impressed.

MatColgrove · November 6, 2017, 4:19pm

Hi Truong Dang,

What you’re seeing in the first example is the overhead of initializing the GPU which can take between 0.5 - 1 seconds. Since you’re problem is so small, this overhead dominates the overall time. As you add more compute on the device, this overhead is amortized and why you start to see speed-up.

-Mat

Truong_Dang · November 7, 2017, 2:57pm

Hi Mat,

Thank you very much for your explanation.

Truong.

Topic		Replies	Views
OpenAcc in Fortran subroutine Legacy PGI Compilers	1	2660	March 10, 2016
Problem:Fortran code with open ACC doesn't gain any speed up Legacy PGI Compilers	8	6673	February 12, 2014
Poor Performance when using OpenACC pragmas Legacy PGI Compilers	1	1076	February 15, 2019
OpenACC doesn't accelerate in my computer Legacy PGI Compilers	2	2172	November 15, 2017
Unsupported local variable Legacy PGI Compilers	8	5036	January 26, 2018
Why my OpenACC code remains slower than OpenMP? Legacy PGI Compilers	3	3938	July 26, 2013
An OpenACC Example (Part 1) Technical Blog	0	367	August 25, 2020
OpenACC for code acceleration Legacy PGI Compilers	13	10667	November 6, 2017
Launch of the kernel Legacy PGI Compilers	4	2864	October 18, 2017
OpenACC: Best way to parallelize nested DO loops (continued) nvc, nvc++ and nvfortran	22	1694	March 28, 2023

Poor perfomance of OpenACC code comparing to serial code

Related topics