escj
October 13, 2014, 10:05am
1
Hello .
New feature problem when profiling with pgprof/pgcollect tested on pgi/13.10 ( because pgi14/X doesn’t work at all )
When profiling a routine containing CPU + OpenAcc computation , with ppgprof this is reported as 2 separate routines ?
→ 1 having only CPU time
→ the other one only with GPU ( kernel/region ) time
A simple example showing this problem :
MODULE mode_sub
CONTAINS
SUBROUTINE sub(X,Y,N)
IMPLICIT NONE
INTEGER :: N
REAL :: X(N) , Y(N)
!$acc kernels present(X)
X = X * 0.9995
!$acc end kernels
! CPU part
Y = Y * 1.0001
END SUBROUTINE sub
END MODULE mode_sub
PROGRAM test_pgcollect
USE mode_sub
IMPLICIT NONE
INTEGER, PARAMETER :: N=134217728, NIT=100
REAL, ALLOCATABLE :: A(:) , B(:)
!$acc declare create(A)
INTEGER :: I
ALLOCATE(A(N),B(N))
!$acc kernels
A = 1.2345
!$acc end kernels
B = 2.3456
DO I=1,NIT
CALL SUB(A,B,N)
END DO
!$acc update host (A(N:N))
print*,"A(N)=",A(N)," B(N)=",B(N)
END PROGRAM test_pgcollect
pgf90 --version
pgf90 13.10-0 64-bit target on x86-64 Linux -tp nehalem
…
pgf90 -ta=nvidia:cuda5.5 openacc_pgcollect_mix.f90 -o openacc_pgcollect_mix_1310
…
pgcollect ./openacc_pgcollect_mix_1310
Profiling single-threaded target program
A(N)= 1.174275 B(N)= 2.369176
target process has terminated, writing profile data
…
pgprof # → write the main window to file
cat prof_mix
Profiled: ./openacc_pgcollect_mix_1310 on Mon Oct 13 11:51:53 CEST 2014
| Function | Seconds | Region Time | Kernel Device Time |
| sub | 17,495 = 96% | 0 = 0% | 0 = 0% |
| [System_Time] | 577 = 3% | 0 = 0% | 0 = 0% |
| __c_mset4 | 124 = 1% | 0 = 0% | 0 = 0% |
| test_pgcollect | 0 = 0% | 28 = 1% | 28 = 1% |
| sub | 0 = 0% | 4853 = 99% | 4853 = 99% |
The detail window show the same separate view , CPU only in one view and only GPU on the other one .
This problem is very painful for the analysis of porting a very big code ( Million of lines … )
:-) Thanks if it could be fixed for the next 14/15.X release …
Bye Juan
tull
November 19, 2014, 12:50am
2
I ran your example with 13.10, 14.10, and the upcoming 15.0 (you will
see 15.1 in January).
I think 15.0 is working better than 14.10. We filed TPR 20904 for this problem and it should be in the 15.1 release.
13.10
Profiled: ./test_pgcollect on Tue Nov 18 16:33:44 PST 2014
| Function | Seconds | Region Time | Kernel Device Time |
| ppoll | 19.488 = 50% | 0.0000 = 0% | 0.0000 = 0% |
| sub | 18.050 = 46% | 0.0000 = 0% | 0.0000 = 0% |
| [System_Time] | 0.825 = 2% | 0.0000 = 0% | 0.0000 = 0% |
| __GI_sched_yield | 0.200 = 1% | 0.0000 = 0% | 0.0000 = 0% |
| sstk | 0.175 = 0% | 0.0000 = 0% | 0.0000 = 0% |
| __c_mset4 | 0.150 = 0% | 0.0000 = 0% | 0.0000 = 0% |
| __write | 0.050 = 0% | 0.0000 = 0% | 0.0000 = 0% |
| _dl_debug_initialize | 0.050 = 0% | 0.0000 = 0% | 0.0000 = 0% |
| __open_nocancel | 0.025 = 0% | 0.0000 = 0% | 0.0000 = 0% |
| __lll_lock_wait_private | 0.012 = 0% | 0.0000 = 0% | 0.0000 = 0% |
| sub | 0.000 = 0% | 0.8033 = 100% | 0.8033 = 100% |
| test_pgcollect | 0.000 = 0% | 0.0038 = 0% | 0.0038 = 0% |
\
14.10
=============================================
Profiled: ./test_pgcollect on Tue Nov 18 16:31:07 PST 2014
| Function | Seconds | Region Time | Kernel Device Time |
| __poll | 17.966 = 50% | 0.0000 = 0% | 0.0000 = 0% |
| sub | 16.690 = 46% | 0.0000 = 0% | 0.0000 = 0% |
| [System_Time] | 0.690 = 2% | 0.0000 = 0% | 0.0000 = 0% |
| sched_getscheduler | 0.253 = 1% | 0.0000 = 0% | 0.0000 = 0% |
| sstk | 0.115 = 0% | 0.0000 = 0% | 0.0000 = 0% |
| __c_mset4 | 0.115 = 0% | 0.0000 = 0% | 0.0000 = 0% |
| _dl_debug_initialize | 0.046 = 0% | 0.0000 = 0% | 0.0000 = 0% |
| __open_nocancel | 0.023 = 0% | 0.0000 = 0% | 0.0000 = 0% |
| sem_timedwait | 0.023 = 0% | 0.0000 = 0% | 0.0000 = 0% |
| __open64 | 0.023 = 0% | 0.0000 = 0% | 0.0000 = 0% |
| fchmodat | 0.023 = 0% | 0.0000 = 0% | 0.0000 = 0% |
| __lll_lock_wait_private | 0.011 = 0% | 0.0000 = 0% | 0.0000 = 0% |
| sub | 0.000 = 0% | 0.0002 = 56% | 0.0002 = 56% |
| test_pgcollect | 0.000 = 0% | 0.0002 = 44% | 0.0002 = 44% |
\
15.0
==============================================
Profiled: ./test_pgcollect on Tue Nov 18 13:51:21 PST 2014
| Function | Seconds | Region Time | Kernel Device Time |
| __poll | 19.101 = 33% | 0.0000 = 0% | 0.0000 = 0% |
| sem_timedwait | 18.868 = 33% | 0.0000 = 0% | 0.0000 = 0% |
| sub | 17.535 = 31% | 0.0000 = 0% | 0.0000 = 0% |
| [System_Time] | 0.651 = 1% | 0.0000 = 0% | 0.0000 = 0% |
| sched_getscheduler | 0.488 = 1% | 0.0000 = 0% | 0.0000 = 0% |
| __c_mset4 | 0.116 = 0% | 0.0000 = 0% | 0.0000 = 0% |
| sstk | 0.101 = 0% | 0.0000 = 0% | 0.0000 = 0% |
| _dl_debug_initialize | 0.047 = 0% | 0.0000 = 0% | 0.0000 = 0% |
| _int_malloc | 0.047 = 0% | 0.0000 = 0% | 0.0000 = 0% |
| read | 0.031 = 0% | 0.0000 = 0% | 0.0000 = 0% |
| __open64 | 0.031 = 0% | 0.0000 = 0% | 0.0000 = 0% |
| __open_nocancel | 0.023 = 0% | 0.0000 = 0% | 0.0000 = 0% |
| __lll_lock_wait_private | 0.023 = 0% | 0.0000 = 0% | 0.0000 = 0% |
| memset | 0.016 = 0% | 0.0000 = 0% | 0.0000 = 0% |
| sighandler_setxid | 0.008 = 0% | 0.0000 = 0% | 0.0000 = 0% |
| sub | 0.000 = 0% | 0.8011 = 100% | 0.8011 = 100% |
| test_pgcollect | 0.000 = 0% | 0.0036 = 0% | 0.0036 = 0% |