I’m working my way through Dr Dobbs cuda programming guide and have a question about local memory.
In the guide we make a program for reversing arrays which uses global memory and one which uses local memory. The latter is supposed to be much faster, however when I run this on my desktop linux Geforce8600 GT I find there is no speed difference between the two programs, but on my macbook pro Geforce9600M there is the discussed speedup by using local memory.
So why does local memory not speed things up on my desktop??? in the programming guide there is mention that the GT200 architecture relaxes the time problem accessing global memory so is that it???
I’m buying some tesla cards at the mo, so should I bother learning to use local memory properly or will this not improve the speed of my programs??? (apart from on my laptop)