in my kernel I need to access a 64 bit variable with two 32 bit pointers. But for some reason when I increment the pointer pp by one, it no longer points to the “second half” of the 64 bit variable.
Yeah it looks like you’re right. Using the same operations on variables residing in shared or global memory space works just fine. Unfortunately using 64-bit variables and accessing them through 32-bit pointers isn’t coalesced in shared memory and only “pseudo coalesced” in global memory.
You mean that the nvcc compiler was really letting you take the address of an in-register variable? That is surely a bug! It should have put it into lmem, which would’ve butchered performance but given you the right result.
I would suggest filing a bug report. (Make a simple self-contained program that exhibits the problem.) I think it’s the fact that you’re using integers. Integers aren’t thoroughly tested in CUDA.