Hello,
I’m trying to profile an openacc accelerated code and get the following error while opening the profile with pgprof:
>> pgprof: FileError.File 'pgprof.out', Line 147, 'p' token expected <<
If I compile and run the code with acc disabled, it works fine.
Here is my compile and run:
>> pgcc -o program -acc -ta=nvidia:cc35,time -Minfo=ccff *.c -lm -L/usr/lib64/nvidia
>> pgcollect -time program
>> pgprof -exe program
I have not compiler errors and pgcollect exits without error.
Could you please help me to find the error? Thanks.
Here is the writen pgprof.out:
PROF NODALL 0 program 1370883638 1370883630
h XXX.icvt.uni-stuttgart.de 30063 0 1 0.000000
I 6 GenuineIntel nehalem-64 linux86-64 6 2 3192
t 2 479
E 1 Seconds
<accelinfo>
<head>
<item>
<label>CUDA Driver Version</label>
<value>5050</value>
</item>
<item>
<label>NVRM version</label>
<value>NVIDIA UNIX x86_64 Kernel Module 319.23 Thu May 16 19:36:02 PDT 2013</value>
</item>
</head>
<body>
<device>
<item>
<label>CUDA Device Number</label>
<value>0</value>
</item>
<item>
<label>Device Name</label>
<value>GeForce GTX TITAN</value>
</item>
<item>
<label>Device Revision Number</label>
<value>3.5</value>
</item>
<item>
<label>Global Memory Size</label>
<value>6441730048</value>
</item>
<item>
<label>Number of Multiprocessors</label>
<value>14</value>
</item>
<item>
<label>Number of SP Cores</label>
<value>2688</value>
</item>
<item>
<label>Number of DP Cores</label>
<value>896</value>
</item>
<item>
<label>Concurrent Copy and Execution</label>
<value>Yes</value>
</item>
<item>
<label>Total Constant Memory</label>
<value>65536</value>
</item>
<item>
<label>Total Shared Memory per Block</label>
<value>49152</value>
</item>
<item>
<label>Registers per Block</label>
<value>65536</value>
</item>
<item>
<label>Warp Size</label>
<value>32</value>
</item>
<item>
<label>Maximum Threads per Block</label>
<value>1024</value>
</item>
<item>
<label>Maximum Block Dimensions</label>
<value>1024, 1024, 64</value>
</item>
<item>
<label>Maximum Grid Dimensions</label>
<value>2147483647 x 65535 x 65535</value>
</item>
<item>
<label>Maximum Memory Pitch</label>
<value>2147483647B</value>
</item>
<item>
<label>Texture Alignment</label>
<value>512B</value>
</item>
<item>
<label>Clock Rate</label>
<value>875 MHz</value>
</item>
<item>
<label>Execution Timeout</label>
<value>Yes</value>
</item>
<item>
<label>Integrated Device</label>
<value>No</value>
</item>
<item>
<label>Can Map Host Memory</label>
<value>Yes</value>
</item>
<item>
<label>Compute Mode</label>
<value>default</value>
</item>
<item>
<label>Concurrent Kernels</label>
<value>Yes</value>
</item>
<item>
<label>ECC Enabled</label>
<value>No</value>
</item>
<item>
<label>Memory Clock Rate</label>
<value>3004 MHz</value>
</item>
<item>
<label>Memory Bus Width</label>
<value>384 bits</value>
</item>
<item>
<label>L2 Cache Size</label>
<value>1572864 bytes</value>
</item>
<item>
<label>Max Threads Per SMP</label>
<value>2048</value>
</item>
<item>
<label>Async Engines</label>
<value>1</value>
</item>
<item>
<label>Unified Addressing</label>
<value>Yes</value>
</item>
<item>
<label>PGI Compiler Option</label>
<value>-ta=nvidia,cc35</value>
</item>
</device>
</body>
</accelinfo>
->LINE 147
p 0
<accelperf>
<hostname>XXX.icvt.uni-stuttgart.de</hostname>
<pid>30063</pid>
<descriptors>
<desc tag="1">
<type>int</type>
<primary_metric>true</primary_metric>
<event_name>Region Time</event_name>
<units>microseconds</units>
</desc>
<desc tag="2">
<type>int</type>
<primary_metric>false</primary_metric>
<event_name>Region Elapsed Time</event_name>
<units>microseconds</units>
</desc>
<desc tag="3">
<type>int</type>
<primary_metric>true</primary_metric>
<event_name>Kernel Device Time</event_name>
<units>microseconds</units>
</desc>
<desc tag="4">
<type>int</type>
<primary_metric>false</primary_metric>
<event_name>Kernel Elapsed Time</event_name>
<units>microseconds</units>
</desc>
<desc tag="5">
<type>int</type>
<primary_metric>false</primary_metric>
<event_name>Data Transfer Time</event_name>
<units>microseconds</units>
</desc>
<desc tag="6">
<type>int</type>
<primary_metric>false</primary_metric>
<event_name>Copyin Time</event_name>
<units>microseconds</units>
</desc>
<desc tag="7">
<type>int</type>
<primary_metric>false</primary_metric>
<event_name>Copyout Time</event_name>
<units>microseconds</units>
</desc>
<desc tag="8">
<type>int</type>
<primary_metric>false</primary_metric>
<event_name>Wait Time</event_name>
<units>microseconds</units>
</desc>
<desc tag="9">
<type>string</type>
<primary_metric>false</primary_metric>
<event_name>Block Size</event_name>
</desc>
<desc tag="10">
<type>string</type>
<primary_metric>false</primary_metric>
<event_name>Grid Size</event_name>
</desc>
</descriptors>
...