I am trying out the OpenACC profiling layer, see e.g. https://www.openacc.org/sites/default/files/inline-images/Specification/OpenACC.3.0.pdf. I am using pgcc from the HPC SDK version 22.9. In this example code
#include <stdio.h>
#include "foo.h"
int main (int argc, char *argv[]) {
int N = 10000;
int a[N], b[N], c[N];
#pragma acc parallel loop async(1)
for (int i = 0; i < N; i++) {
a[i] = i;
}
#pragma acc parallel loop async(2)
for (int i = 0; i < N; i++) {
b[i] = 2 * i;
}
#pragma acc wait(1) async(2)
#pragma acc parallel loop async(2)
for (int i = 0; i < N; i++) {
c[i] = a[i] + b[i];
}
#pragma update self(c[0:N]) async(2)
#pragma acc wait (1,2)
printf ("c[0]: %d\n", c[0]);
printf ("c[N-1]: %d\n", c[N-1]);
}
with foo.h
void foo (int *x, int N) {
#pragma acc parallel loop
for (int i = 0; i < N; i++) {
x[i]++;
}
}
I want to trace the wait pragma using
#include <stdio.h>
#include "acc_prof.h"
void wait_start (acc_prof_info *prof_info, acc_event_info *event_info, acc_api_info *api_info) {
printf ("Wait start: %s %d %d\n", prof_info->src_file, prof_info->line_no, prof_info->end_line_no);
}
void acc_register_library (acc_prof_reg register_ev, acc_prof_reg unregister_ev, acc_prof_lookup_func lookup) {
register_ev (acc_ev_wait_start, wait_start, 0);
}
which is compiled into a shared library and set to LD_PRELOAD
. I get the following output:
Wait start: /home/cweiss/tmp/test_openacc_async_wait/foo.h 28 28
Wait start: /home/cweiss/tmp/test_openacc_async_wait/foo.h 28 28
The source file attributed to the wait region is not the main file, but instead the included foo.h
. Note that the function implemeted in foo.h
is not used at all. Moreover, if I remove the pragma from the function foo
, the output shows the main source file.
I guess this is a bug, or do I understand something wrong? The documentation is not very clear about what the prof_info.src_file
variable precisely indicates.