Ompt_callback_work with ompt_scope_end not always dispatched

Using nvc 23.1, host-only:

Although registered as ompt_set_always, the following code doesn’t dispatch ompt_callback_work at ws-loop-end:

#pragma omp parallel
{
    #pragma omp for
    for (int i=1; i<5; i++)
        foo(i);
}

Expected events are:

encountering: parallel_begin
worker: implicit_task( begin )
worker: work( begin )
worker: work( end ) <- expected but missing
# sync for
worker: sync_region( begin )
worker: sync_region_wait( begin )
worker: sync_region_wait( end )
worker: sync_region( end )
# sync parallel
worker: sync_region( begin )
worker: sync_region_wait( begin )
worker: sync_region_wait( end )
worker: sync_region( end )
worker: implicit_task( end )
encountering: parallel_end

Note that the combined construct

#pragma omp parallel for
for (int i=1; i<5; i++)
    foo(i);

dispatches all events as expected, i.e.

encountering: parallel_begin
worker: implicit_task( begin )
worker: work( begin )
worker: work( end )
worker: sync_region( begin )
worker: sync_region_wait( begin )
worker: sync_region_wait( end )
worker: sync_region( end )
worker: implicit_task( end )
encountering: parallel_end

It would be nice if this could be fixed in the next release.

Issue also mentioned here: NVHPC 22.11/23.1 -- OMPT methods can cause SegFault when offloading - #6 by jan.andre.reuter

Thanks,
Christian

Thanks Christian. Do you have the complete example with the callbacks included? We can try to pull something together, but if you have it already, that would be great.

-Mat

ompt-printf-0.tar.gz (122.0 KB)
reproducer2.c (245 Bytes)
reproducer1.c (289 Bytes)

Hi Mat,

please use reproducer1.c and reproducer2.c with the attached ompt-printf-0.tar.gz like this:

tar xf ompt-printf-0.tar.gz
cd ompt-printf-0
./configure CC=nvc --prefix=`pwd`/_install
make install
# See make install's compile and link instructions and build reproducer1 and reproducer2. DOn't forget to add -mp=ompt to the link line.

The output is more verbose than what I posted originally (use low thread count), but it should be straight forward to decode.

Thanks
Christian

@MatColgrove Any updates regarding this issue? It would be great if it would be fixed in the next release. Hopefully this is the last one preventing us from using the host side of your OMPT runtime in Score-P.

As of NVHPC 23.3, this issue is still present.

Hi Jan,

Looks like I missed Christian’s second post so didn’t report the issue. Sincere apologies!

I just submitted a new issue report, TPR #33556, and will have engineering investigate.

Thanks!
Mat

1 Like

I’ve had a look at the newly released NVHPC 23.5 today and unfortunately it seems like the bug is still present.
Is there any progress towards fixing the issue?

Looking at the bug report, it seems that it’s been triaged and the problem has been identified, a missing “__kmpc_for_static_fini” call.

It’s been assigned to an engineer but I don’t know when it will be addressed. It will depend on her workload and what other items she has in her queue.

Hi Christian, Jan,

TPR #33556 should be fixed in our 23.7 release which Jan confirmed in the following post:

Thanks again for the report and please let us know if there are any additional issues you find.

-Mat