When I use nSight to examine variables in this kernel, the result of the assignment to s[0] is shown incorrectly.
__global__ void Check(void)
{
unsigned long long s[8];
s[0] = 0x1000;
Locals window shows s[0]=0 after stepping past the assignment above. I expect 0x1000.
- s 0x0000000000fffc10 {0x0000000000000000, 0x0000000000000000, 0x0000000000000000, 0x0000000000000000, ...} unsigned long long[8] __local__
[0] 0x0000000000000000 __local__ unsigned long long&
The nvcc command line for the build is
1>..."C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\bin\nvcc.exe" -gencode=arch=compute_30,code=\"sm_30,compute_30\" --use-local-env --cl-version 2017 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio017\Community\VC\Tools\MSVC4.11.25503\bin\HostX86\x64" -x cu ... -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\include" -G --keep --keep-dir x64\Debug -maxrregcount=0 --machine 64 --compile -cudart static -g -DWIN64 -D_WINDOWS -D_DEBUG -DTablebaseBuild -Xcompiler "/EHsc /W3 /nologo /Od /FS /Zi /RTC1 /MTd " -o x64\Debug\PosX_CUDA.cu.obj "C:\ProjectDir\PosX_CUDA.cu"
(I have elided some library includes for other parts of the C++ code)
This is a debug build.
Disassembly of the CUDA code corresponding to this line looks reasonable
0x002c98b0 [0052] tmp3:
0x002c98b0 [0053] mov.b32 %r3, %r8;
0x002c98c8 [0054] tmp4:
0x002c98c8 [0055] mov.u32 %r28, %r3;
0x002c98d8 [0056] tmp5:
0x002c98d8 [0058] BB0_1:
0x002c98d8 [0060] mov.u32 %r4, %r28;
0x002c98f0 [0061] tmp6:
0x002c98f0 [0062] setp.lt.s32 %p1, %r4, 64;
0x002c98f8 [0063] not.pred %p2, %p1;
0x002c9908 [0064] @%p2 bra BB0_4;
0x002c9910 [0065] bra.uni BB0_2;
0x002c9918 [0067] BB0_2:
0x002c9918 [0068] mov.u64 %rd1, 4096;
Am I mis-using nSight, is there some error in the build, or the code? I expect to see s[0]=0x1000.
GeForce GTX 1050 Ti / CUSD runtime 9.0 / nSight 5.4 / Visual Studio Community 2017 15.3 / 64-bit Windows 7