local memory

The use of local memory inside of device calls and global calls is not well stated.
Take the code below( just example code)
Are i, sum, and b[10] local? or are they global? If a varible cannot fit into a register, is it global?
device int calculate( int *a , int loop)
{
int i;
int sum=0;

int b[10];
for (i=0;i<loop;i++) // loop always has to be less than 10;
{
b[i]=i;
}

for (i=0;i<loop;i++)
sum+=b[i];

return sum;
}

What about a global function.
are a and b local or global?
global loop ( int * a)
{
int a;
int b;
a=calculate(a, …)
if (a)
{
do something to copy some results to the hosts.
}
}

Thanks Ralphb
global loop(*int a, int b)
{
z=a;
calculate
}

If there are not enough registers, some vatiables are spiled in to the global memory which is order of magnitude slower. Keep in minf that the spiled variables are only accessable from the thread to which they belong.
Local means the varibles are only accessible from the specific thread.

In your example global loop b is a local variables and all threads have these variables, but with different values. The a variable is declared 2 times which wwill not work. This is not fortran, if you put in the list of arguments int *a, it means that the function is expecting a pointer from the global memory. If you declare it inside the function int a it is local to the thread.

There is confusion between the physical types of memory and the software abstractions.

Good luck

Local memory implements the stack of each thread. It is used for doing the following (the following listing my be not exhaustive, but just get the idea):

  1. Implementing the ABI (Application Binary Interface), namely, passing of arguments to functions, passing of function results back to the caller etc.;
  2. Registers spill;
  3. Holding arrays whose indices are not resolved by the compiler.

With Fermi, local memory is implemented within the L1 cache which is physically done by the same hardware as the shared memory. The L1 cache is limited to a maximum amount of 48K. I believe that, if “too much” local memory is requested, global memory will come to action (but please verify).

For the following function

__global__ loop (int* a)
{
   int b;
   ...
}

I think that, both a and b are local variables.

For the code,

__device__ int calculate(int *a, int loop)
{ 
   int i;
   int sum=0;

   int b[10];
   for (i=0;i<loop;i++) // loop always has to be less than 10;
   {
      b[i]=i;
   }

   for (i=0;i<loop;i++)
   sum+=b[i];

   return sum;
}

I think that a, b and loop will be local, while i and sum will be register variables.

Anyway, you can check the amount of used local memory by the the options -Xptxas -v, abi=no to nvcc.

because loop is a function argument and its limits are not known at compile time, I think the b array ends up in local memory.

Christian