Occassional Truncation (or Zeroing) of most significant 32 bits of 64 bit pointer - Windows 10 - Cuda 9.2

I have found an occasional issue when passing a pointer to a kernel.
According to the host debugger the pointer I am passing is 0x0000001309400000.
However on the device side the pointer is coming up as 0x0000000009400000?

In general this problem does not occur (only for one kernel so far) and furthermore the pointer is passed as expected when I use CUDA 8.0. Could this be a bug in CUDA 9.2?

Has anyone else had this problem or offer an explanation?



It could be a bug, because operations on 64-bit pointers use pairs of registers. The first thing you would want to figure out whether there is a problem with the debugger in displaying the pointer (e.g. by losing track of which registers the pointer is stored in) or a bug in code generation (e.g. wrong register assigned by the compiler).

The second scenario would obviously be much more serious as the code would not work correctly, which will be easy to tell in most cases as a pointer corrupted in this way is likely out-of-bounds as well.

A quick sanity check could be to lower back-end optimizations (the default is -O3). So try compiling with -Xptxas -O{2|1|0} and note at which level the issue goes away. If it does go away at lower optimization levels, there is a good chance you have unearthed a code-generation bug. Note that backend code generation uses machine-specific generators, so the issue may only occur for specific GPU architectures (you haven’t mentioned which one you use).

You would want to reduce the original code to the smallest code that still reproduces the issue, then file a bug with NVIDIA, attaching this repro code.

If you want to figure out whether the problem is real or a debugger/tool artifact, print out the pointer value from the kernel, using an in-kernel printf statement with %p format parameter.