template<typename T, int thread_group_width = kWarpSize>
__inline__ __global__ void WelfordWarpAllReduce(T thread_mean, T thread_m2, T thread_count, T* mean,
T* m2, T* count)
I compile my test.cu with nvcc -g -G test.cu
I set breakpoint within the kernel and call print *mean, cuda-gdb shows:
(cuda-gdb) print *mean
$2 = 0
but the value of *mean is supposed to be 1. I add printf within the kernel to print the value of *mean and it does output 1.
How should I fix it? I hope the cuda-gdb can print the correct value of *mean.
my environment:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Mon_May__3_19:15:13_PDT_2021
Cuda compilation tools, release 11.3, V11.3.109
Build cuda_11.3.r11.3/compiler.29920130_0
NVIDIA (R) CUDA Debugger
11.3 release
Portions Copyright (C) 2007-2021 NVIDIA Corporation
GNU gdb (GDB) 8.3.1
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
test.cu (2.9 KB)
Hi @AKravets . I upload the test.cu which is the full source file. And I compile it with nvcc -g -G test.cu. And debug it with cuda-gdb a.out.
The problem occurs when I try to debug within the kernel WelfordWarpAllReduce. I add two printf inside the kernel at line 47 and line 51 to check the value the pointer point to.
The output of nvidia-smi:
Hi @AKravets , using the @global pointer qualifier works for me too. Thanks. However, is there any way to make it works for nsight vscode edition? Actually I am using the nsight vscode edition to develop my project and couldn’t get the correct value when debugging, so I tried cuda-gdb to see what is wrong.