nvtxRangePushA() and nvtxRangePop() returns -2

I just finished writing a CUDA dot product test code to support my ongoing development work to add NVTX support into HPCToolkit.

For some reason I’m getting error code -2 on all my NVTX calls. In the code, I only use nvtxRangePushA() and nvtxRangePop() to create label regions with the following hierarchy:

main()
|_ init
|\_ h2d
|_ calc
|_ d2h

What does the error code -2 mean? The NVTX docs only mention this…

If an error occurs a negative value is returned.

Any pointers in the right direction would be appreciated.

one source of negative errors is if a pop doesn’t have a corresponding push. Does the very first invocation of a push in your code return an error? If not (or if you have a pop before a push to start things off) then I would carefully study your hierarchy to see if things are happening out of order.

Hi Bob, thanks for the quick response! Yes, the very first push already returned -2.

Make sure you don’t have a catastrophic CUDA error occurring prior to that. What happens if you call cudaGetLastError() before that first push. are any errors reported? Alternatively, if you run your code with compute-sanitizer, are any errors reported?

I temporarily added this block right before the first push:

  {
    cudaError_t status = cudaGetLastError();
    cout << "Before first push: cudaGetLastError = " << status << endl;
  }

and the output goes

Before first push: cudaGetLastError = 0

Compute sanitizer says

========= ERROR SUMMARY: 0 errors

Oh, I forgot, if you want to try to replicate the issue, please use the following:

git clone https://github.com/wyphan/testcodes.git
cd testcodes/cpp-cu-ddot
make USE_NVTX=1
./ddot.x 1000

Technically you can use any positive number other than 1000. I already put in a guard in the code so it won’t go through if your GPU don’t have enough global memory.

I’ve filed an internal bug (4153204) to have it looked at. When I have more info, I will report here. If I have not reported here, it means I don’t have more info yet.

As a temporary workaround I suggest to ignore the return value.

1 Like

I wish I can ignore the return value, but here’s an interesting bit: when I run the code under Nsight Systems, the error code changes to -1, but Nsight Systems seems to capture the regions correctly!

wp11@ufront:~/work/testcodes/cpp-cu-ddot$ nsys profile -o cuda-ddot --stats=true --trace=nvtx ./ddot.x 1000000
Using device NVIDIA A100-PCIE-40GB
NVTX Error: -1
h_A[999999] = 999999
h_B[999999] = 2e+06
NVTX Error: -1
NVTX Error: -1
NVTX Error: -1
NVTX Error: -1
Kernel 1 (ddot), workspace size = 4096
Using grid: 3907, 1, 1
Using block: 256, 1, 1
Kernel 2 (reduceblocks), workspace size = 4096, filled = 3907
Using grid: 16, 1, 1
Using block: 256, 1, 1
Kernel 2 (reduceblocks), workspace size = 16, filled = 16
Using grid: 1, 1, 1
Using block: 256, 1, 1
NVTX Error: -1
NVTX Error: -1
NVTX Error: -1
Success! Result = 6.66666e+17
Generating '/tmp/nsys-report-2ed1.qdstrm'
[1/3] [========================100%] cuda-ddot.nsys-rep
[2/3] [========================100%] cuda-ddot.sqlite
[3/3] Executing 'nvtx_sum' stats report

 Time (%)  Total Time (ns)  Instances   Avg (ns)     Med (ns)    Min (ns)   Max (ns)   StdDev (ns)   Style   Range
 --------  ---------------  ---------  -----------  -----------  ---------  ---------  -----------  -------  -----
     76.0        4,806,275          1  4,806,275.0  4,806,275.0  4,806,275  4,806,275          0.0  PushPop  init 
     22.1        1,396,473          1  1,396,473.0  1,396,473.0  1,396,473  1,396,473          0.0  PushPop  h2d  
      1.6          100,161          1    100,161.0    100,161.0    100,161    100,161          0.0  PushPop  calc 
      0.4           24,500          1     24,500.0     24,500.0     24,500     24,500          0.0  PushPop  d2h  

Generated:
    /home/wp11/work/testcodes/cpp-cu-ddot/cuda-ddot.nsys-rep
    /home/wp11/work/testcodes/cpp-cu-ddot/cuda-ddot.sqlite

That is my point exactly (in terms the of the workaround suggestion). You get the desired behavior, apart from the return code.

1 Like