Nsight Compute 2023.3.1.0 can't handle G++ OpenMP offload kernel naming format

Given the following example code…

#include <cstdio>
#include <cstdint>

/** Function to do something in omp loop **/
uint32_t fn(uint32_t x)
{ 
  return 2*x; 
}

/** The main progam **/
int
main(int argc, char *argv[]) 
{
  int nd = 16000;
  uint32_t umem[nd];

  for (int i=0; i<10; i++) umem[i] = i;
  for (int i=0; i<10; i++) printf("%d ", umem[i]); printf("\n");
#pragma omp target teams distribute parallel for map(tofrom:umem[0:nd])
  for (int i=0; i<nd; i++) {
    umem[i] = fn(umem[i]);
  }
  for (int i=0; i<10; i++) printf("%d ", umem[i]); printf("\n");
}

… compiled with an offload enabled version of gcc/13.2 …

g++ -O3 -fopenmp -fopt-info-optimized-omp -flto -foffload=nvptx-none="-moptimize -lm -misa=sm_80" -o bug1.gxx bug1.cxx -I/opt/NVIDIA/cuda-12.3.1/include -L/opt/NVIDIA/cuda-12.3.1/lib64

… then running in NVIDIA Nsight Compute (2023.3.1.0 build 33474944) which came as part of CUDA 12.3.1 …

The issue is that g++ generates a kernel named “main$_omp_fn$0”

If one tries to profile the kernel, the name gets broken in the command line that gets run …

/opt/NVIDIA/cuda-12.3.1/nsight-compute-2023.3.1/target/linux-desktop-glibc_2_11_3-x64/ncu --config-file off --export "/user1/Documents/NVIDIA Nsight Compute/report%i" --force-overwrite --kernel-name main/bin/sh --launch-count 1 --rule AchievedOccupancy --rule Compute --rule LaunchConfiguration --rule Memory --rule SOLBottleneck --rule SharedMemoryConflicts --rule TheoreticalOccupancy /user1/CAD/GPU/bug1.gxx

… and the kernel doesn’t get profiled correctly. Basically I’m guessing the “$*” are taken as a glob by a shell.

If you take the command line and change the --kernel-name argument to be ‘main$_omp_fn$0’ (i.e. quote) then running that command line by hand will correctly find and profile the kernel.

Now if I can figure out why my real application runs 10x slower using g++ vs. nvc++. ;^\

Hi, @cfreese

Thanks for sharing this for us. We can reproduce this with your demo code. So in what aspect, you think we should improve ?

On reflection I probably should have been more precise in that this is an ncu-ui issue. ncu does seem to work as long as one quotes properly on the command line, but ncu-ui is the part that gets confused by the “$” in the format when it synthesizes the ncu command.

I think it would be nice to have this fixed.

P.S. Someone should check LLVM OpenMP offload; I don’t have a working compiler to see if it has similar problems.

Hi, @cfreese

In your case, it seems main$omp_fn$0 has been replace with main/bin/sh, maybe that’s because the value of $_ in your machine is /bin/sh.

So are you copying the command line from NCU GUI and paste in your console to execute ?
Or you press “Launch” in NCU GUI directly ?

(As we tried if profile in GUI directly, it works actually)

Correct. The command with the “/bin/sh” is from the ncu-ui window. Whatever is creating that command within ncu-ui is getting confused by the “$” signs. When I run a profile from within ncu-ui it says no kernels were profiled and I’m betting that’s because the command line has the kernel name as “/bin/sh…” and not a valid kernel name. In my experience over the past week, I’ve never been able to profile from within ncu-ui because the g++ OpenMP kernel name always gets clobbered. As I noted in the original, if I cut and paste the command line from the ncu-ui and put in a quoted ‘main$omp_fn$0’ as the kernel-name, then it runs correctly as an ncu command.

Try to profile a kernel names “main$omp_fn$0” from within ncu-ui. If it works for you then there’s something even weirder going on and maybe it in the way my shells are setup.

We tried, actually it works if I put main$omp_fn$0 in the filter in NCU GUI.

Interesting. When I click on a kernel in the timeline and select “profile” I get a pop up. On the “filter” tab the wrong name (i.e. main/bin/sh) populates the kernel name. So whatever is populating that “kernel name” field is getting confused by the “$”. I was able to type in the kernel name by hand and have things work, so there is a manual work around.

Thanks.

You are right. If “profile kernel” from timeline, the filter kernel name is not correct. I will report a internal issue for this.

Thanks a lot !