pgprof: FileError.File 'pgprof.out'

Hello,

I’m trying to profile an openacc accelerated code and get the following error while opening the profile with pgprof:

>> pgprof: FileError.File 'pgprof.out', Line 147, 'p' token expected <<

If I compile and run the code with acc disabled, it works fine.

Here is my compile and run:

>> pgcc -o program -acc -ta=nvidia:cc35,time -Minfo=ccff *.c -lm -L/usr/lib64/nvidia
>> pgcollect -time program
>> pgprof -exe program

I have not compiler errors and pgcollect exits without error.

Could you please help me to find the error? Thanks.


Here is the writen pgprof.out:

PROF NODALL 0 program 1370883638 1370883630
h XXX.icvt.uni-stuttgart.de 30063 0 1 0.000000
I 6 GenuineIntel nehalem-64 linux86-64 6 2 3192
t 2 479
E 1 Seconds
<accelinfo>
  <head>
      <item>
        <label>CUDA Driver Version</label>
        <value>5050</value>
      </item>
      <item>
        <label>NVRM version</label>
        <value>NVIDIA UNIX x86_64 Kernel Module  319.23  Thu May 16 19:36:02 PDT 2013</value>
      </item>
  </head>
  <body>
    <device>
      <item>
        <label>CUDA Device Number</label>
        <value>0</value>
      </item>
      <item>
        <label>Device Name</label>
        <value>GeForce GTX TITAN</value>
      </item>
      <item>
        <label>Device Revision Number</label>
        <value>3.5</value>
      </item>
      <item>
        <label>Global Memory Size</label>
        <value>6441730048</value>
      </item>
      <item>
        <label>Number of Multiprocessors</label>
        <value>14</value>
      </item>
      <item>
        <label>Number of SP Cores</label>
        <value>2688</value>
      </item>
      <item>
        <label>Number of DP Cores</label>
        <value>896</value>
      </item>
      <item>
        <label>Concurrent Copy and Execution</label>
        <value>Yes</value>
      </item>
      <item>
        <label>Total Constant Memory</label>
        <value>65536</value>
      </item>
      <item>
        <label>Total Shared Memory per Block</label>
        <value>49152</value>
      </item>
      <item>
        <label>Registers per Block</label>
        <value>65536</value>
      </item>
      <item>
        <label>Warp Size</label>
        <value>32</value>
      </item>
      <item>
        <label>Maximum Threads per Block</label>
        <value>1024</value>
      </item>
      <item>
        <label>Maximum Block Dimensions</label>
        <value>1024, 1024, 64</value>
      </item>
      <item>
        <label>Maximum Grid Dimensions</label>
        <value>2147483647 x 65535 x 65535</value>
      </item>
      <item>
        <label>Maximum Memory Pitch</label>
        <value>2147483647B</value>
      </item>
      <item>
        <label>Texture Alignment</label>
        <value>512B</value>
      </item>
      <item>
        <label>Clock Rate</label>
        <value>875 MHz</value>
      </item>
      <item>
        <label>Execution Timeout</label>
        <value>Yes</value>
      </item>
      <item>
        <label>Integrated Device</label>
        <value>No</value>
      </item>
      <item>
        <label>Can Map Host Memory</label>
        <value>Yes</value>
      </item>
      <item>
        <label>Compute Mode</label>
        <value>default</value>
      </item>
      <item>
        <label>Concurrent Kernels</label>
        <value>Yes</value>
      </item>
      <item>
        <label>ECC Enabled</label>
        <value>No</value>
      </item>
      <item>
        <label>Memory Clock Rate</label>
        <value>3004 MHz</value>
      </item>
      <item>
        <label>Memory Bus Width</label>
        <value>384 bits</value>
      </item>
      <item>
        <label>L2 Cache Size</label>
        <value>1572864 bytes</value>
      </item>
      <item>
        <label>Max Threads Per SMP</label>
        <value>2048</value>
      </item>
      <item>
        <label>Async Engines</label>
        <value>1</value>
      </item>
      <item>
        <label>Unified Addressing</label>
        <value>Yes</value>
      </item>
      <item>
        <label>PGI Compiler Option</label>
        <value>-ta=nvidia,cc35</value>
      </item>
    </device>
  </body>
</accelinfo>

->LINE 147

p 0
<accelperf>
<hostname>XXX.icvt.uni-stuttgart.de</hostname>
<pid>30063</pid>
<descriptors>
<desc tag="1">
<type>int</type>
<primary_metric>true</primary_metric>
<event_name>Region Time</event_name>
<units>microseconds</units>
</desc>
<desc tag="2">
<type>int</type>
<primary_metric>false</primary_metric>
<event_name>Region Elapsed Time</event_name>
<units>microseconds</units>
</desc>
<desc tag="3">
<type>int</type>
<primary_metric>true</primary_metric>
<event_name>Kernel Device Time</event_name>
<units>microseconds</units>
</desc>
<desc tag="4">
<type>int</type>
<primary_metric>false</primary_metric>
<event_name>Kernel Elapsed Time</event_name>
<units>microseconds</units>
</desc>
<desc tag="5">
<type>int</type>
<primary_metric>false</primary_metric>
<event_name>Data Transfer Time</event_name>
<units>microseconds</units>
</desc>
<desc tag="6">
<type>int</type>
<primary_metric>false</primary_metric>
<event_name>Copyin Time</event_name>
<units>microseconds</units>
</desc>
<desc tag="7">
<type>int</type>
<primary_metric>false</primary_metric>
<event_name>Copyout Time</event_name>
<units>microseconds</units>
</desc>
<desc tag="8">
<type>int</type>
<primary_metric>false</primary_metric>
<event_name>Wait Time</event_name>
<units>microseconds</units>
</desc>
<desc tag="9">
<type>string</type>
<primary_metric>false</primary_metric>
<event_name>Block Size</event_name>
</desc>
<desc tag="10">
<type>string</type>
<primary_metric>false</primary_metric>
<event_name>Grid Size</event_name>
</desc>
</descriptors>
...

Hi ManuelICVT,

You’re just a bit too early. The pgcollect was just updated to generate an XML profile for use with pgprof. However, pgprof wont officially support viewing of this profile information until next month’ 13.7 release.

In the mean time, please set the environment variable “PGI_ACC_TIME=1” instead of using pgcollect to view OpenACC performance profiling.

Thanks,
Mat

Thanks Mat for the information.