pgi/13.1, pgsampt Crash writing to "/tmp/prof

Hello .

With pgi/13.1 , pgcollect/pgsampt crash trying to write to the directory “/tmp/prof.log” ?

Demo with the samples coming with PGI pack .


pgfortran --version

pgfortran 13.1-1 64-bit target on x86-64 Linux -tp nehalem 
Copyright 1989-2000, The Portland Group, Inc.  All Rights Reserved.
Copyright 2000-2013, STMicroelectronics, Inc.  All Rights Reserved.

pwd
/home/escj/dir_PGF/PGI_HOME/linux86-64/13.1/etc/samples/accel

make f3.exe
pgfortran -o f3.exe f3.f90 -ta=nvidia -Minfo=accel -fast
smooth:
     24, Generating copyin(b(:,:))
         Generating copy(a(:,:))
     26, Generating present_or_copy(a(:,:))
...

pgsampt f3.exe
            0  errors found
        66818  microseconds on GPU
           72  microseconds on host
target process has terminated, writing profile data
Erreur de segmentation

strace show the opening of the strange log file “/tmp/prof.log”

strace pgsampt f3.exe
...
open("/tmp/prof.log", O_WRONLY|O_CREAT|O_TRUNC, 0666) = -1 ENOENT (No such file or directory)
open("/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq", O_RDONLY) = 16
fstat(16, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b24c5fde000
read(16, "2661000\n", 4096)             = 8
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV +++
Erreur de segmentation

For testing I’ve created this directory in my PC ,and after all run OK :

ls -l /tmp/
total 0

pgsampt f3.exe
            0  errors found
        69881  microseconds on GPU
           71  microseconds on host
target process has terminated, writing profile data

ls -lrt pgprof.out 
-rw-r--r-- 1 escj users 1194 févr.  5 17:30 pgprof.out

Next bug coming with pgcollect not showing anymore acc kernel timing ?

A+

Juan

Well that’s embarrassing. My apologies, and thank you for reporting this. The crash will be fixed in release 13.2, expected out this week.

As far as not showing accelerator timings any more, we don’t see that in our testing. Note that if the performance data values in the Accelerator Performance tab are zero, then no data is displayed. If you are seeing another problem, please let us know.

For the “no more acc accelerator timing” .

Alway with the same sample “f3.exe” .

Following the “pgprof13ug.pdf” page 22-25 .

With pgi/12.10 ALL OK

pgfortran --version
pgfortran 12.10-0 64-bit target on x86-64 Linux -tp nehalem

=> compilation/exec of the sample f3.exe with options “ccff & -g ,etc”

 pwd
/home/escj/dir_PGF/PGI_HOME/linux86-64/13.1/etc/samples/accel

pgfortran -g -o f3.exe f3.f90 -ta=nvidia -Minfo=accel,ccf,all -fast
...
pgcollect -time -cudainit f3.exe 5000
            0  errors found
       450158  microseconds on GPU
       305996  microseconds on host
target process has terminated, writing profile data

pgprof -exe f3.exe

The PGPROF window show very similar view as Fig 2.12 p.24 of pgprof13ug.pdf
=> 4 columns

less pg.txt
Profiled: ./f3.exe on Wed Feb 06 10:39:31 CET 2013

| Function                | Seconds         | Accelerator Region Time | Accelerator Kernel Time |

| __select_nocancel       |  1,3908 =  46%  |       0 =   0%          |       0 =   0%          |
| main                    |    8046 =  27%  |       0 =   0%          |       0 =   0%          |
| smoothhost              |    3448 =  11%  |       0 =   0%          |       0 =   0%          |
| __GI_sched_yield        |    3448 =  11%  |       0 =   0%          |       0 =   0%          |
| sstk                    |     460 =   2%  |       0 =   0%          |       0 =   0%          |
| __c_mcopy4              |     460 =   2%  |       0 =   0%          |       0 =   0%          |
| __lll_lock_wait_private |     115 =   0%  |       0 =   0%          |       0 =   0%          |
| do_lookup_x             |     115 =   0%  |       0 =   0%          |       0 =   0%          |
| smooth                  |       0 =   0%  |    7663 = 100%          |    3158 = 100%          |

The smooth replace the mm1 function of the user guide doc .

And diving in smooth show where in the subroutine the time is spend on region &k ernel accelarated by directives

 less smooth.txt
Profiled: ./f3.exe on Wed Feb 06 10:39:31 CET 2013

| Line | Source                                                                  | Seconds         | Accelerator Region Time | Accelerator Kernel Time |

|      |  subroutine smooth( a, b, w0, w1, w2, n, m, niters )                    |       0 =   0%  |       0 =   0%          |       0 =   0%          |
|      |   real, dimension(:,:) :: a,b                                           |       0 =   0%  |       0 =   0%          |       0 =   0%          |
|      |   real :: w0, w1, w2                                                    |       0 =   0%  |       0 =   0%          |       0 =   0%          |
|      |   integer :: n, m, niters                                               |       0 =   0%  |       0 =   0%          |       0 =   0%          |
|      |   integer :: i, j, iter                                                 |       0 =   0%  |       0 =   0%          |       0 =   0%          |
|      |   !$acc data region copy(a(:,:)) copyin(b(:,:))                         |       0 =   0%  |    4501 =  59%          |       0 =   0%          |
|      |    do iter = 1,niters                                                   |       0 =   0%  |       0 =   0%          |       0 =   0%          |
|      |    !$acc region                                                         |       0 =   0%  |    3161 =  41%          |       0 =   0%          |
|      |     do i = 2,n-1                                                        |       0 =   0%  |       0 =   0%          |       0 =   0%          |
|      |      do j = 2,m-1                                                       |       0 =   0%  |       0 =   0%          |    2077 =  66%          |
|      |       a(i,j) = w0 * b(i,j) + &                                          |       0 =   0%  |       0 =   0%          |       0 =   0%          |

With pgi/13.1, PB NO MORE DATA/KERNEL COLUMN

pgfortran --version
pgfortran 13.1-1 64-bit target on x86-64 Linux -tp nehalem
pgfortran -g -o f3.exe f3.f90 -ta=nvidia -Minfo=accel,ccf,all -fast
...

 pgcollect -time -cudainit f3.exe 5000
            0  errors found
       451390  microseconds on GPU
       282197  microseconds on host
target process has terminated, writing profile data

pgprof -exe f3.exe

The sample spend 45139 ms on GPU but the PGPROG window show now :

less pg131.txt
Profiled: ./f3.exe on Wed Feb 06 11:02:06 CET 2013

| Function                | Seconds         |

| __select_nocancel       |  1,3678 =  48%  |
| main                    |    8046 =  28%  |
| __GI_sched_yield        |    3678 =  13%  |
| smoothhost              |    2989 =  10%  |
| sstk                    |     230 =   1%  |
| __lll_lock_wait_private |     115 =   0%  |

=> No more smooth routine accelerate by acc directives , only the host one is shown …
=> No more region/kernel timing

Rem :
activing the “pgcollect -cuda” option give some info on the gpu kernel generated by the compiler …
but the profile obtained by this way is completely flatten and relation with the smooth source code is completely lost !

less pg131_cuda.txt
Profiled: ./f3.exe on Wed Feb 06 11:11:31 CET 2013

| Function                | Seconds         | CUDA GPU Secs   | CUDA CPU Secs   |

| __select_nocancel       |  1,3596 =  48%  |       0 =   0%  |       0 =   0%  |
| main                    |    7865 =  28%  |       0 =   0%  |       0 =   0%  |
| __GI_sched_yield        |    3596 =  13%  |       0 =   0%  |       0 =   0%  |
| smoothhost              |    3146 =  11%  |       0 =   0%  |       0 =   0%  |
| sstk                    |     225 =   1%  |       0 =   0%  |       0 =   0%  |
| __lll_lock_wait_private |     112 =   0%  |       0 =   0%  |       0 =   0%  |
| smooth_28_gpu           |       0 =   0%  |    2128 =  57%  |       1 =   0%  |
| smooth_35_gpu           |       0 =   0%  |    1051 =  28%  |       0 =   0%  |
| memcpyDtoHasync         |       0 =   0%  |     194 =   5%  |     197 =  34%  |
| memcpyHtoDasync         |       0 =   0%  |     374 =  10%  |     374 =  65%  |

A+

Juan

Hello Don …
No news …

Could you check/reproduce the missigne time kernel/region problem on your side ?

A+

Juan

Juan,

Here is the story with accelerator profiling in 13.1 (and 13.2):

The accelerator runtime has been completely reorganized in 13.x. As part of that work the portion of the runtime that generates the accelerator profiler data has been reworked to allow users or other tool developers to add their own data collection facilities.

Unfortunately that work has not been finished and will not appear in a PGI release until 13.3 at the soonest. There is no workaround when using 13.1 or 13.2.

Obviously we had some communication and testing issues on our end or we would have been able to inform you better/sooner about this. I will work on addressing these issues right away.

–Don

Hello Don .

Ok , so I go back to pgi/12.10 … and waiting for the pgi/13.3 …

You have my support for the work to do …

A+

Juan

Hello .

:-) pgcollect/pgprof is back for ACC directives code in pgi 13.7

:-) Thank you for the great job …

A+

Juan

Thank you for the great job …

Yes, our tools team did do a great job with this! Hope you find it as useful as I do.

  • Mat

Hello .

:-| Sorry I’m back on this post .

I’ve tested the last 13.9 ( pb identical in 13.7 )

The profiling is working well for OpenACC code & pgcollect

… but the tool pgprof doesn’t show the 'Region Time’ in the column .

even if in the detail ‘Accelerator Performance’ Frame this region time are reported correctly .

I have checked that The option : View → Select Columns → Region Time is ON

Again tested with the smooth sample :
dir_PGF/PGI_HOME/linux86-64/13.9/etc/samples/accel/f3.f90

escj@aeropc107:~/poub> less smooth_pgcollect
Profiled: ./f3.exe on Tue Oct 01 15:01:35 CEST 2013

| Line | Source                                                                  | Seconds         | Kernel Device Time |

|      |  subroutine smooth( a, b, w0, w1, w2, n, m, niters )                    |       0 =   0%  |       0 =   0%     |
|      |   real, dimension(:,:) :: a,b                                           |       0 =   0%  |       0 =   0%     |
|      |   real :: w0, w1, w2                                                    |       0 =   0%  |       0 =   0%     |
|      |   integer :: n, m, niters                                               |       0 =   0%  |       0 =   0%     |
|      |   integer :: i, j, iter                                                 |       0 =   0%  |       0 =   0%     |
|      |   !$acc data region copy(a(:,:)) copyin(b(:,:))                         |       0 =   0%  |       0 =   0%     |
|      |    do iter = 1,niters                                                   |       0 =   0%  |       0 =   0%     |
|      |    !$acc region                                                         |       0 =   0%  |       0 =   0%     |
|      |     do i = 2,n-1                                                        |       0 =   0%  |       0 =   0%     |
|      |      do j = 2,m-1                                                       |       0 =   0%  |    1663 =  63%     |
|      |       a(i,j) = w0 * b(i,j) + &                                          |       0 =   0%  |       0 =   0%     |
|      |                w1 * (b(i-1,j) + b(i,j-1) + b(i+1,j) + b(i,j+1)) + &     |       0 =   0%  |       0 =   0%     |
|      |                w2 * (b(i-1,j-1) + b(i-1,j+1) + b(i+1,j-1) + b(i+1,j+1)) |       0 =   0%  |       0 =   0%     |
|      |      enddo                                                              |       0 =   0%  |       0 =   0%     |
|      |     enddo                                                               |       0 =   0%  |       0 =   0%     |
|      |     do i = 2,n-1                                                        |       0 =   0%  |       0 =   0%     |
|      |      do j = 2,m-1                                                       |       0 =   0%  |     964 =  37%     |
|      |       b(i,j) = a(i,j)                                                   |       0 =   0%  |       0 =   0%     |
|      |      enddo                                                              |       0 =   0%  |       0 =   0%     |
|      |     enddo                                                               |       0 =   0%  |       0 =   0%     |
|      |    !$acc end region                                                     |       0 =   0%  |       0 =   0%     |
|      |    enddo                                                                |       0 =   0%  |       0 =   0%     |
|      |   !$acc end data region                                                 |       0 =   0%  |       0 =   0%     |

Bye Juan

Hi Juan,

I’ve reproduced what you’re seeing here with the sample program. Indeed, there is not a Region Time column in the upper pane even when that column is selected in the Select Columns dialog.

It looks like the reason this column is not appearing in the upper pane is because there is no data for this metric for the sample program. Data appears in the lower pane for Region Elapsed Time because it is a different metric and there is data for it in the profile.

Does this make sense? If I’ve misunderstood the problem you’re seeing, please let me know.

It seems like Region Time should be grayed-out in the Select Columns dialog box when there isn’t any data for it. I’ve opened TPR 19611 to track this issue.

Annemarie

Hello Anne Marie .

Well . I think it’s an error of denomination of the column or link between this two

The “Region Time” should probably show the “Region Elapse Time (total)”

as for example the “Kernel Device Time” show in fact I think the “Kernel Device Time (total)”

I know that the profiling as been completely rebuilt so some incoherence are probable
from the new measurement and the GUI .
In the PGPROF user guide pgprof13ug.pdf page 24-26 for example it is the Max of this variable which are shown .

Bye

Juan

Hi Juan,

In fact the region time and region elapsed time are different measurements. The intent is that region time is the execution time on the accelerator, whereas region elapsed time is wall clock time on the host. We are investigating why there is no data generated for region time in the example you provided. Thanks for alerting us to this situation.

Annemarie

Hi Juan,

I’m following up on the lack of data for Region Time. The next PGI release (after 13.10) should include data for Region Time as a separate metric. Thanks again for your feedback.

Annemarie