If i make a block that uses all shared memory and all registers of a multiprocesor I asume that there will be only one block active on a multiprocesor. But when i use only half of all memory space will there be two blocks active (asuming a block has 64 threads)? Or does this depend on how complicated your program code is? Where is your program code stored anyway?
if you kernel/block size combination uses less than half the shared memory, less than half the maximum registers per block, and contains less than half the maximum permissible threads per multiprocessor, you should wind up with two running blocks per multiprocessor. NVIDIA ship an occupancy calculator spreadsheet in the recent versions of the SDK. It lets you play around with kernel execution parameters and see the effect on occupancy.
As I understand it, code is stored in global memory.