ACC_NOTIFY?

I am having trouble using the ACC_NOTIFY command to see how many time the proams enters an acc region.

The following output shows my curret line of reasoning and input.

It is as follows:

[james@james Desktop]$ echo $ACC_NOTIFY
1
[james@james Desktop]$ ./first
launch kernel file=/home/james/Desktop/c1.c function=main line=43 device=0 grid=782 block=128 queue=0
100000 iterations completed
[james@james Desktop]$ pgaccelinfo
CUDA Driver Version: 5000
NVRM version: NVIDIA UNIX x86_64 Kernel Module 304.60 Sun Oct 14 20:23:00 PDT 2012

Device Number: 0
Device Name: GeForce GTX 275
Device Revision Number: 1.3
Global Memory Size: 938803200
Number of Multiprocessors: 30
Number of Cores: 240
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536
Total Shared Memory per Block: 16384
Registers per Block: 16384
Warp Size: 32
Maximum Threads per Block: 512
Maximum Block Dimensions: 512, 512, 64
Maximum Grid Dimensions: 65535 x 65535 x 1
Maximum Memory Pitch: 2147483647B
Texture Alignment: 256B
Clock Rate: 1404 MHz
Execution Timeout: Yes
Integrated Device: No
Can Map Host Memory: Yes
Compute Mode: default
Concurrent Kernels: No
ECC Enabled: No
Memory Clock Rate: 1134 MHz
Memory Bus Width: 448 bits
Max Threads Per SMP: 1024
Async Engines: 1
Unified Addressing: No
Initialization time: 1793 microseconds
Current free memory: 840809728
Upload time (4MB): 1610 microseconds ( 893 ms pinned)
Download time: 2157 microseconds (1246 ms pinned)
Upload bandwidth: 2605 MB/sec (4696 MB/sec pinned)
Download bandwidth: 1944 MB/sec (3366 MB/sec pinned)

I have the ACC_NOTIFY device set to to 1 as directed in the article


PGI Accelerator Programming Model for NVIDIA GPUs Part1. The executable first is simply c1.c compled to to an executable. I have set ACC_NOTIFY to 1 as diected in the subsectio First program.

When I ran ./first I get the output but no indication of the device’s action. I also used

export ACC_NOTIFY=0.

I got the same results. What is going on?

Any help appeciated. Thanks in advance.

Newport_j

Hi James,

The line “launch kernel file=” is the output from ACC_NOTIFY so it seems to be working as expected. Note that there is only one kernel launch in the c1 example. What output were you looking for? The profiling info?

$ pgcc -ta=nvidia c1.c  -o first
$ export ACC_NOTIFY=1
$ ./first
launch kernel  file=/..cut../c1.c function=main line=25 device=0 grid=782 block=128 queue=0
100000 iterations completed
$ export ACC_NOTIFY=0
$ ./first
100000 iterations completed

$ export PGI_ACC_TIME=1
$ ./first
100000 iterations completed

Accelerator Kernel Timing data
/..cut../c1.c
  main
    23: region entered 1 time
        time(us): total=53,604 init=50,184 region=3,420
                  kernels=77 data=2,993
        w/o init: total=3,420 max=3,420 min=3,420 avg=3,420
        25: kernel launched 1 times
            grid: [782]  block: [128]
            time(us): total=77 max=77 min=77 avg=77
  • Mat

in the doc PGI Accelerator Programming Model for NVIDIA GPUs Part 1

under the heading

First program

it talks about exporting ACC_NOTIFY=1

It mentions no other environment variable. My program has many instances of when it enters the GPU acclerator and leaves. I want to know how many times that accelator region is entered and what are data transfer times and computation times for each time it makes this trip. I want to know this for each and every paraellized loop.

I thought that ACC_NOTIFY can do this. If it only gives the one line outpt on c1.c. What command gives the output that I previously described?

Thnaks in advance.

THX 1138

Okay, this is what I am looking for. I am copying directly from the document mentioned in the previously posted message.

It is:

Accelerator Kernel Timing data
c2.c
main
32: region entered 1 times
time(us): total=1182682 init=1180869 region=1813
kernels=170 data=1643
w/o init: total=1813 max=1813 min=1813 avg=1813
34: kernel launched 1 times
time(us): total=170 max=170 min=170 avg=170


Now what can I do to get that kind of output on each of my accelerator regions?

Your command

PGI_ACC_TIME=1

is only recent and as the name implies comes with the advent of OpenACC. However, the outputs as shown above dates much further back than that.

Thanks in advance.

THX 1138

Now what can I do to get that kind of output on each of my accelerator regions?

The basic profiling information from setting PGI_ACC_TIME=1 (or using the -ta=nvidia,time) will show the time spent in each accelerator region. If you are not seeing any output, then you don’t have any accelerator regions in your code.

is only recent and as the name implies comes with the advent of OpenACC.

No, it’s a PGI environment variable but does work on both the PGI Accelerator Model as well as PGI’s implementation of OpenACC.

  • Mat