Although registered as ompt_set_always, the following code doesn’t dispatch ompt_callback_work at ws-loop-end:
#pragma omp parallel
{
#pragma omp for
for (int i=1; i<5; i++)
foo(i);
}
Expected events are:
encountering: parallel_begin
worker: implicit_task( begin )
worker: work( begin )
worker: work( end ) <- expected but missing
# sync for
worker: sync_region( begin )
worker: sync_region_wait( begin )
worker: sync_region_wait( end )
worker: sync_region( end )
# sync parallel
worker: sync_region( begin )
worker: sync_region_wait( begin )
worker: sync_region_wait( end )
worker: sync_region( end )
worker: implicit_task( end )
encountering: parallel_end
Note that the combined construct
#pragma omp parallel for
for (int i=1; i<5; i++)
foo(i);
dispatches all events as expected, i.e.
encountering: parallel_begin
worker: implicit_task( begin )
worker: work( begin )
worker: work( end )
worker: sync_region( begin )
worker: sync_region_wait( begin )
worker: sync_region_wait( end )
worker: sync_region( end )
worker: implicit_task( end )
encountering: parallel_end
It would be nice if this could be fixed in the next release.
Thanks Christian. Do you have the complete example with the callbacks included? We can try to pull something together, but if you have it already, that would be great.
please use reproducer1.c and reproducer2.c with the attached ompt-printf-0.tar.gz like this:
tar xf ompt-printf-0.tar.gz
cd ompt-printf-0
./configure CC=nvc --prefix=`pwd`/_install
make install
# See make install's compile and link instructions and build reproducer1 and reproducer2. DOn't forget to add -mp=ompt to the link line.
The output is more verbose than what I posted originally (use low thread count), but it should be straight forward to decode.
@MatColgrove Any updates regarding this issue? It would be great if it would be fixed in the next release. Hopefully this is the last one preventing us from using the host side of your OMPT runtime in Score-P.
I’ve had a look at the newly released NVHPC 23.5 today and unfortunately it seems like the bug is still present.
Is there any progress towards fixing the issue?
Looking at the bug report, it seems that it’s been triaged and the problem has been identified, a missing “__kmpc_for_static_fini” call.
It’s been assigned to an engineer but I don’t know when it will be addressed. It will depend on her workload and what other items she has in her queue.