Nsight Compute 2023.3.1.0 can't handle G++ OpenMP offload kernel naming format

Given the following example code…

#include <cstdio>
#include <cstdint>

/** Function to do something in omp loop **/
uint32_t fn(uint32_t x)
{ 
  return 2*x; 
}

/** The main progam **/
int
main(int argc, char *argv[]) 
{
  int nd = 16000;
  uint32_t umem[nd];

  for (int i=0; i<10; i++) umem[i] = i;
  for (int i=0; i<10; i++) printf("%d ", umem[i]); printf("\n");
#pragma omp target teams distribute parallel for map(tofrom:umem[0:nd])
  for (int i=0; i<nd; i++) {
    umem[i] = fn(umem[i]);
  }
  for (int i=0; i<10; i++) printf("%d ", umem[i]); printf("\n");
}

… compiled with an offload enabled version of gcc/13.2 …

g++ -O3 -fopenmp -fopt-info-optimized-omp -flto -foffload=nvptx-none="-moptimize -lm -misa=sm_80" -o bug1.gxx bug1.cxx -I/opt/NVIDIA/cuda-12.3.1/include -L/opt/NVIDIA/cuda-12.3.1/lib64

… then running in NVIDIA Nsight Compute (2023.3.1.0 build 33474944) which came as part of CUDA 12.3.1 …

The issue is that g++ generates a kernel named “main$_omp_fn$0”

If one tries to profile the kernel, the name gets broken in the command line that gets run …

/opt/NVIDIA/cuda-12.3.1/nsight-compute-2023.3.1/target/linux-desktop-glibc_2_11_3-x64/ncu --config-file off --export "/user1/Documents/NVIDIA Nsight Compute/report%i" --force-overwrite --kernel-name main/bin/sh --launch-count 1 --rule AchievedOccupancy --rule Compute --rule LaunchConfiguration --rule Memory --rule SOLBottleneck --rule SharedMemoryConflicts --rule TheoreticalOccupancy /user1/CAD/GPU/bug1.gxx

… and the kernel doesn’t get profiled correctly. Basically I’m guessing the “$*” are taken as a glob by a shell.

If you take the command line and change the --kernel-name argument to be ‘main$_omp_fn$0’ (i.e. quote) then running that command line by hand will correctly find and profile the kernel.

Now if I can figure out why my real application runs 10x slower using g++ vs. nvc++. ;^\

Hi, @cfreese

Thanks for sharing this for us. We can reproduce this with your demo code. So in what aspect, you think we should improve ?

On reflection I probably should have been more precise in that this is an ncu-ui issue. ncu does seem to work as long as one quotes properly on the command line, but ncu-ui is the part that gets confused by the “$” in the format when it synthesizes the ncu command.

I think it would be nice to have this fixed.

P.S. Someone should check LLVM OpenMP offload; I don’t have a working compiler to see if it has similar problems.

Hi, @cfreese

In your case, it seems main$omp_fn$0 has been replace with main/bin/sh, maybe that’s because the value of $_ in your machine is /bin/sh.

So are you copying the command line from NCU GUI and paste in your console to execute ?
Or you press “Launch” in NCU GUI directly ?

(As we tried if profile in GUI directly, it works actually)

Correct. The command with the “/bin/sh” is from the ncu-ui window. Whatever is creating that command within ncu-ui is getting confused by the “$” signs. When I run a profile from within ncu-ui it says no kernels were profiled and I’m betting that’s because the command line has the kernel name as “/bin/sh…” and not a valid kernel name. In my experience over the past week, I’ve never been able to profile from within ncu-ui because the g++ OpenMP kernel name always gets clobbered. As I noted in the original, if I cut and paste the command line from the ncu-ui and put in a quoted ‘main$omp_fn$0’ as the kernel-name, then it runs correctly as an ncu command.

Try to profile a kernel names “main$omp_fn$0” from within ncu-ui. If it works for you then there’s something even weirder going on and maybe it in the way my shells are setup.

We tried, actually it works if I put main$omp_fn$0 in the filter in NCU GUI.

Interesting. When I click on a kernel in the timeline and select “profile” I get a pop up. On the “filter” tab the wrong name (i.e. main/bin/sh) populates the kernel name. So whatever is populating that “kernel name” field is getting confused by the “$”. I was able to type in the kernel name by hand and have things work, so there is a manual work around.

Thanks.

You are right. If “profile kernel” from timeline, the filter kernel name is not correct. I will report a internal issue for this.

Thanks a lot !

Hi, @cfreese

We have a recently Nsight Compute release with fix for this issue.
Can you have a try ?

I had a hard time reconstructing what I had done, but as best as I can tell using Version: 2024.2.0.0 (build 34181891) (public-release) the problem is fixed.

Just to further (and more precisely document) the issue was that if you’re in the timeline view and you right click on a kernel and “Profile Kernel” the name doesn’t always get correctly put in the synthesized ncu command line. This appeared when using gcc gpu offload and the kernel name was of the form “main$omp_fn$0”.

Thanks for the fix.

Thanks for the reply ! @cfreese