Generic pointer vs Shared Memory Pointer


I have a pointer to shared memory which I access. I am sometimes bound to have it cast to a generic pointer for some particular reasons. As far as I know different instructions are used for accessing the memory given whether the pointer is a generic pointer or a shared memory pointer.

My questions are.

  1. Will there be any performance difference while accessing the data if it is converted to generic pointer even though the data is still in shared memory ?

  2. If it is converted to generic memory pointer will the broadcasting of shared memory still take place when all the threads in a warp try to access the same data simultaneously ? (if not it is a big loss for me)

Anil Mahmud

Functionally there is no difference between accessing shared memory through a generic pointer vs a memory-space specific pointer.

There may be a slight performance impact. All platforms supported by CUDA today are 64-bit platforms, meaning pointers are 64-bit pointers requiring two registers each and emulated 64-bit arithmetic for address arithmetic. For memory-pace specific pointers to shared memory, the compiler may take advantage of the fact that this memory space is known to require only 32-bit addresses, and optimize code accordingly, resulting in fewer registers used and fewer instructions used for indexing arithmetic.

In most practical use cases involving heavy use of shared memory the shared memory access itself is the bottleneck, so for these cases I would expect no performance differences outside noise level (+/- 2%). You should be able to easily assess the impact on your particular use case by measuring the performance with both pointer types.