Questions about memory address


I have several questions about memory address…

First, I want to know the starting address (and end address or range) of global and shared memory spaces.

Is there a way to know the starting address of global memory space? Or, are they specified in some document? (Plz let me know)

In regular systems, I could know the addresses (including range) of stack space (e.g., 0x8000000), heap space, etc.

Second, look at the following example code.

void myHost() {

   // Note: "float *check" has global memory (I omitted the allocation part)


   // Load check to host memory

   // Print check


__global__ void myKernel(float *check) {

   int i, j;

   float temp;

   float a[4][4];

   float b[4][4];

// store memory addresses to check

   if (threadIdx.x == 0) {

	  check[0] = (int)a;

	  check[1] = (int)b;




__device__ void subKernel(float (*a)[4]) {

   // do nothing


Actually, check[0] had value 0, and check[1] had value 64. Can anybody explain the meaning of these values for me? (Why the memory address of “a” has 0?)

(Here, some strange thing happened: sometimes check[0] and check[1] showed altered values. That is, in some case, check[0] had value 64 and check[1] had value 0.)

In case when I tried “check[2] = (int)&i;”, I got a critical error. I guess this is because “i” is stored in a register, thus I can’t get its address.

If both a[4][4] and b[4][4] are stored in registers, how could I execute even “check[0] = (int)a;”? (So I don’t think a and b are registers)

I read some articles about local memory. The author said that normally automatic variables are stored in registers, and if the variables are too many to be held in registers, then global memory is used for them.

In this case, “a[4][4] and b[4][4]” are stored in global memory? (because I could do “check[0] = (int)a;”)

Third, if not the case (i.e., a[4][4] and b[4][4] are stored in registers), how can I pass them into “subKernel”? (I cannot know the addresses of registers!)

And how can I update the variables of registers (i.e., “i” and “j” in the above example) in subKernel?


I dont think you can find this out. How the addresses within shared and global memory are interpreted is dependent on the driver and hardware. There are those contexts, I think each would have its own virtual address space, and processes can attach it to access the same memory.
Shared memory is per block of threads while the kernel is running. The compiler takes care to distinguish different pointers and work with them correctly.
Again, it is implementation dependent, but I can guess from you experiment that local memory has a separate address space which is per block per kernel launch and you variables a and b reside there. Compiler and driver are free to allocate them in the order they like.

Any device function (subkernel) will be inlined into you kernel, so you can pass variables like i in and out by value and the compiler will optimize out unnecessary copying after inlining. I think so.

I can’t tell about taking address of local variables allocated to registers…

Register indexing is not supported. If you do something with variable indexing a local array like that, the compiler will push the entire array into “local memory”, which is really in the global memory space, but divided up so that each thread gets its own private storage. The main drawback to local memory is that it has the bandwidth of global memory (since that is physically where it is located), rather than the bandwidth of registers, which are much, much faster.

I would assume that register indirection through pointers would also force the value to be put into local memory rather than stored directly in the register file. (The compiler also spills registers to local memory if it thinks your kernel is using too much, or if you specify a register usage limit with nvcc options.)