Runtime-problem-with-pgfortran and OpenACC

chriaa.intissar · October 7, 2019, 11:30am

We have a HPC server Apollo XL190r gen9 equipped with 1 processor Intel(R) Xeon(R) CPU E5-2697A v4 @ 2.60GHz and 2 accelerators NVIDIA K40 used for parallel programming.
So we use PGFORTRAN for compilation.

Trying to compile two fortran code : one with acc directives and one without acc directives but the execution time of sequential code is lower than the parallel one (see below for more details)

without acc directives

[instm@localhost step1]$ pgfortran -acc -ta=nvidia -Minfo=accel laplace2d.f90 -o lpc
[instm@localhost step1]$ time ./lpc
Jacobi relaxation Calculation: 4096 x 4096 mesh
0 0.250000
100 0.002397
200 0.001204
300 0.000804
400 0.000603
500 0.000483
600 0.000403
700 0.000345
800 0.000302
900 0.000269
completed in 53.059 seconds

real 0m53.094s
user 0m53.035s
sys 0m0.053s

with acc directives

[instm@localhost step1]$ pgfortran -acc -ta=nvidia -Minfo=accel laplace2d.f90 -o lpc
laplace:
75, Generating implicit copyout(anew(1:4094,1:4094))
Generating implicit copyin(a(0:4095,0:4095))
76, Loop is parallelizable
77, Loop is parallelizable
Accelerator kernel generated
Generating Tesla code
76, !$acc loop gang, vector(4) ! blockidx%y threadidx%y
77, !$acc loop gang, vector(32) ! blockidx%x threadidx%x
80, Generating implicit reduction(max:error)
90, Generating implicit copyin(anew(1:4094,1:4094))
Generating implicit copyout(a(1:4094,1:4094))
91, Loop is parallelizable
92, Loop is parallelizable
Accelerator kernel generated
Generating Tesla code
91, !$acc loop gang, vector(4) ! blockidx%y threadidx%y
92, !$acc loop gang, vector(32) ! blockidx%x threadidx%x
[instm@localhost step1]$ time ./lpc
Jacobi relaxation Calculation: 4096 x 4096 mesh
0 0.250000
100 0.002397
200 0.001204
300 0.000804
400 0.000603
500 0.000483
600 0.000403
700 0.000345
800 0.000302
900 0.000269
completed in 84.195 seconds

real 1m24.346s
user 1m17.734s
sys 0m6.616s
[instm@localhost step1]$

Robert_Crovella · October 7, 2019, 1:29pm

You appear to be working through a pretty standard educational code sequence that I am familiar with.

This is an expected outcome for the initial porting of the jacobi iteration loop.

You need to continue the exercise to use the data directives so that data is not copied between host and device on every iteration of the while-loop.

Topic		Replies	Views
An OpenACC Example (Part 1) Technical Blog	0	379	August 25, 2020
less speed of accelerator directives Legacy PGI Compilers	6	3520	March 26, 2012
Check performance Legacy PGI Compilers	4	3269	September 28, 2017
Launch of the kernel Legacy PGI Compilers	4	2882	October 18, 2017
Fortran code not compiling for GPU Legacy PGI Compilers	11	7396	August 23, 2017
Problem:Fortran code with open ACC doesn't gain any speed up Legacy PGI Compilers	8	6692	February 12, 2014
Runtime problem with PGFORTRAN Linux	40	1194	October 7, 2019
Poor perfomance of OpenACC code comparing to serial code Legacy PGI Compilers	3	2923	November 7, 2017
Help making code perform better using GPU rather than CPU Legacy PGI Compilers	5	6639	September 2, 2010
Add OpenACC to a Fortran loop Legacy PGI Compilers	5	7163	December 3, 2015

Runtime-problem-with-pgfortran and OpenACC

Related topics