In which memory space do kernel parameters reside?

I typically use the global and constant memory spaces to store the data that my kernels will work with. But when it comes to the parameters that I pass to my kernels, I do not know where this information is!
In CUDA 1.1, one of the parameters I passed to my kernels was a 100 bytes long structure. I think that this is not allowed any more in CUDA 2.0 (am I right) so now I have to pass this structure with cudaMemCpy. To which memory space should I copy it in order to preserve the efficiency of CUDA 1.1?

CUDA stores kernel parameters in shared memory. The limit is 256 bytes total.

Does that mean that if I am executing 64 blocks simultaneously, this information will ocuppy 64*sizeOfParameter bytes of the total shared memory space.

And, actually, are these parameters copied somehow directly to the shared memory or are they first copied to global memory and then to shared memory?

shared memory is per multiprocessor.
Parameter usage will add up to the shared memory usage per block and that way possibly reduce the occupancy.
That is one reason to use constant memory for parameters that do not change between kernel calls.

In some sense, yes. But if you launch 10,000 blocks not all of them can run concurrently so only num_running_blocks*sizeOfParameter bytes are used from the “total” shared memory space.

Note that since shared memory is per block is is less confusing to not think about “total” shared memory in the first place.

I don’t believe this is documented anywhere. A cubin expert may correct me, but AFAIK the parameters are populated at the before the entry point of the kernel. Maybe the thread scheduler has a special 256 byte memory area it uses and sets the shared memory when launching a block, I don’t know.