If I pass a pointer to shared memory to a function, then does the speed advantage of shared memory disappear if I dereference it with a regular function argument pointer?
No, but there are some pitfalls with compute capability 1.x:
Since there are only four offset registers available in hardware, using pointers in shared memory may result in additional address arithmetics and swapping of address registers. Disassemble your device code with [font=“Courier New”]cuobjdump -sass[/font] to see if this is the case.
You need to make sure the compiler can deduce that your pointers point to shared memory. This can be a bit tricky since there is no construct to inform the compiler.
On compute capability 2.x and 3.0 these should be non-issues due to their generic addressing mode. I’m not entirely sure though about the use of offset registers and amount of address arithmetic instructions generated, as I have not yet analyzed as much 2.x code as 1.x code.