Cuda-gdb doesn't print correct value that a pointer point to within the kernel

here is the kernel declaration.

template<typename T, int thread_group_width = kWarpSize>
__inline__ __global__ void WelfordWarpAllReduce(T thread_mean, T thread_m2, T thread_count, T* mean,
                                                T* m2, T* count)

I compile my test.cu with nvcc -g -G test.cu
I set breakpoint within the kernel and call print *mean, cuda-gdb shows:

(cuda-gdb) print *mean
$2 = 0

but the value of *mean is supposed to be 1. I add printf within the kernel to print the value of *mean and it does output 1.
How should I fix it? I hope the cuda-gdb can print the correct value of *mean.

my environment:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Mon_May__3_19:15:13_PDT_2021
Cuda compilation tools, release 11.3, V11.3.109
Build cuda_11.3.r11.3/compiler.29920130_0

NVIDIA (R) CUDA Debugger
11.3 release
Portions Copyright (C) 2007-2021 NVIDIA Corporation
GNU gdb (GDB) 8.3.1
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Hi @umiswing
Could you share a full source file, which we can use to reproduce the issue on our side?

Also can you share the output of nvidia-smi command?

test.cu (2.9 KB)
Hi @AKravets . I upload the test.cu which is the full source file. And I compile it with nvcc -g -G test.cu. And debug it with cuda-gdb a.out.
The problem occurs when I try to debug within the kernel WelfordWarpAllReduce. I add two printf inside the kernel at line 47 and line 51 to check the value the pointer point to.
The output of nvidia-smi:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.19.01    Driver Version: 465.19.01    CUDA Version: 11.3     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:1A:00.0 Off |                  N/A |
| 30%   26C    P8    24W / 350W |      3MiB / 24268MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  On   | 00000000:3D:00.0 Off |                  N/A |
| 30%   43C    P2   199W / 350W |  20608MiB / 24268MiB |     75%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA GeForce ...  On   | 00000000:89:00.0 Off |                  N/A |
| 30%   22C    P8    26W / 350W |      3MiB / 24268MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA GeForce ...  On   | 00000000:B2:00.0 Off |                  N/A |
| 30%   24C    P8    24W / 350W |      3MiB / 24268MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

Hi @umiswing ,
Could you try using the @global pointer qualifier when reading the memory? The following worked for me (using your example)

(cuda-gdb) p *((float *)mean)
$5 = 0
(cuda-gdb) p *((@global float *)mean)
$6 = 1
1 Like

Hi @AKravets , using the @global pointer qualifier works for me too. Thanks. However, is there any way to make it works for nsight vscode edition? Actually I am using the nsight vscode edition to develop my project and couldn’t get the correct value when debugging, so I tried cuda-gdb to see what is wrong.

Hi @umiswing,
Glad it worked for you for standalone cuda-gdb! I will move the topic to VS Code support forum branch.

1 Like