What is local memory ?

Local memories seem to be faster than the code that only uses the global memory.
However, from the CUDA manual, I remember that local memories accesses are the same as global memory accesses…

Is it that some of the local memories are done in the registers ?

You can read paragraph ‘5.1.2.2 Local Memory’ of cuda programming guide - which answers all your questions.