Local memory - a copy for each thread?

Sometimes kernel parameters or local variables are put in local memory, not shared memory. In Programming Guide 2.3, there is stated:
"Local memory accesses are always coalesced though since they are per-thread by definition. ".
So even if some variables are not modified, there is a copy for each thread??? Or the compiler knows that soem variables are not modified and puts one copy of these variables in shared memory?

I don’t believe the current compiler will ever use shared memory for user variables unless they explicitly defined as shared.

Yes I forgot. But what about kernel function parameters?

By “kernel function parameters” do you mean kernel arguments? Those are only passed by pointer, and always from global memory (or by reference/value for constants). If you are meaning device functions, those are inlined, so wherever the kernel has them (register, shared memory, local memory, global memory), is where they stay.

Not exactly. Kernel parameters are stored at beginning of your shared memory, however the parameters are usually a pointer to some array in global memory. But you can pass ints, floats or even whole structures. Keep in mind however that the total size of kernel parameters is limited to 256 bytes if I recall correctly.

I am afraid that is not precise also. Device function arguments are evaluated and stored in registers, unless you pass references. Consider the following code:

__device__ void incrementMe(int val) {

	val=val+1;

}

__global__ void cache(int *data) {

	data[0]=0;

	incrementMe(data[0]);

}

As a result of the above, data[0] will still be equal to 0 (I did compile and test it to be sure).

However if you pass argument by reference:

__device__ void incrementMe(int &val)

the global variable will indeed be incremented.

Since incrementMe(int&) is inlined each time you call it, you can use the same function to increment global, shared and register variables.