Generic pointer vs Shared Memory Pointer

anil.mahmud · June 24, 2018, 4:56pm

Hello,

I have a pointer to shared memory which I access. I am sometimes bound to have it cast to a generic pointer for some particular reasons. As far as I know different instructions are used for accessing the memory given whether the pointer is a generic pointer or a shared memory pointer.

My questions are.

Will there be any performance difference while accessing the data if it is converted to generic pointer even though the data is still in shared memory ?
If it is converted to generic memory pointer will the broadcasting of shared memory still take place when all the threads in a warp try to access the same data simultaneously ? (if not it is a big loss for me)

Thanks,
Anil Mahmud

njuffa · June 24, 2018, 6:09pm

Functionally there is no difference between accessing shared memory through a generic pointer vs a memory-space specific pointer.

There may be a slight performance impact. All platforms supported by CUDA today are 64-bit platforms, meaning pointers are 64-bit pointers requiring two registers each and emulated 64-bit arithmetic for address arithmetic. For memory-pace specific pointers to shared memory, the compiler may take advantage of the fact that this memory space is known to require only 32-bit addresses, and optimize code accordingly, resulting in fewer registers used and fewer instructions used for indexing arithmetic.

In most practical use cases involving heavy use of shared memory the shared memory access itself is the bottleneck, so for these cases I would expect no performance differences outside noise level (+/- 2%). You should be able to easily assess the impact on your particular use case by measuring the performance with both pointer types.

Topic		Replies	Views
shared memory passed to a function. shared memory lose speed when referenced through a stack pointer CUDA Programming and Performance	1	2079	April 22, 2012
Pointer arithmetic with shared memory CUDA Programming and Performance	1	1472	February 7, 2010
Shared memory alternative CUDA Programming and Performance	7	2539	December 7, 2011
pointer to shared memory compiler problems CUDA Programming and Performance	19	14873	June 7, 2008
What's different between LD and LDG (load from generic memory vs. load from global memory) CUDA Programming and Performance	10	12034	March 13, 2022
In-Kernel memcpy with different memory types & shared memory question CUDA Programming and Performance	2	747	April 30, 2018
Confirm that dynamically allocated __shared__ memory is just as fast as the statically allocated variety? CUDA Programming and Performance	0	311	July 15, 2022
Pointers to pointers on shared memory CUDA Programming and Performance	4	6921	March 30, 2009
Why do I need to convert a pointer to shared address space before using the ldmatrix instruction? CUDA Programming and Performance	4	779	December 13, 2023
Questions about memory pointer size in 64 bit environments CUDA Programming and Performance	0	1122	January 22, 2010

Generic pointer vs Shared Memory Pointer

Related topics