Questions about memory address

NucL23 · June 10, 2009, 6:44pm

Hi,

I have several questions about memory address…

First, I want to know the starting address (and end address or range) of global and shared memory spaces.

Is there a way to know the starting address of global memory space? Or, are they specified in some document? (Plz let me know)

In regular systems, I could know the addresses (including range) of stack space (e.g., 0x8000000), heap space, etc.

Second, look at the following example code.

void myHost() {

   // Note: "float *check" has global memory (I omitted the allocation part)

   myKernel<<<1,256>>>(check);

   // Load check to host memory

   // Print check

}

__global__ void myKernel(float *check) {

   int i, j;

   float temp;

   float a[4][4];

   float b[4][4];

// store memory addresses to check

   if (threadIdx.x == 0) {

	  check[0] = (int)a;

	  check[1] = (int)b;

   }

subKernel(a);

}

__device__ void subKernel(float (*a)[4]) {

   // do nothing

}

Actually, check[0] had value 0, and check[1] had value 64. Can anybody explain the meaning of these values for me? (Why the memory address of “a” has 0?)

(Here, some strange thing happened: sometimes check[0] and check[1] showed altered values. That is, in some case, check[0] had value 64 and check[1] had value 0.)

In case when I tried “check[2] = (int)&i;”, I got a critical error. I guess this is because “i” is stored in a register, thus I can’t get its address.

If both a[4][4] and b[4][4] are stored in registers, how could I execute even “check[0] = (int)a;”? (So I don’t think a and b are registers)

I read some articles about local memory. The author said that normally automatic variables are stored in registers, and if the variables are too many to be held in registers, then global memory is used for them.

In this case, “a[4][4] and b[4][4]” are stored in global memory? (because I could do “check[0] = (int)a;”)

Third, if not the case (i.e., a[4][4] and b[4][4] are stored in registers), how can I pass them into “subKernel”? (I cannot know the addresses of registers!)

And how can I update the variables of registers (i.e., “i” and “j” in the above example) in subKernel?

Regards,

Knedlik · June 10, 2009, 9:27pm

I dont think you can find this out. How the addresses within shared and global memory are interpreted is dependent on the driver and hardware. There are those contexts, I think each would have its own virtual address space, and processes can attach it to access the same memory.
Shared memory is per block of threads while the kernel is running. The compiler takes care to distinguish different pointers and work with them correctly.
Again, it is implementation dependent, but I can guess from you experiment that local memory has a separate address space which is per block per kernel launch and you variables a and b reside there. Compiler and driver are free to allocate them in the order they like.

Any device function (subkernel) will be inlined into you kernel, so you can pass variables like i in and out by value and the compiler will optimize out unnecessary copying after inlining. I think so.

I can’t tell about taking address of local variables allocated to registers…

seibert · June 10, 2009, 10:04pm

Register indexing is not supported. If you do something with variable indexing a local array like that, the compiler will push the entire array into “local memory”, which is really in the global memory space, but divided up so that each thread gets its own private storage. The main drawback to local memory is that it has the bandwidth of global memory (since that is physically where it is located), rather than the bandwidth of registers, which are much, much faster.

I would assume that register indirection through pointers would also force the value to be put into local memory rather than stored directly in the register file. (The compiler also spills registers to local memory if it thinks your kernel is using too much, or if you specify a register usage limit with nvcc options.)

Topic		Replies	Views
Memory Questions CUDA Programming and Performance	6	2398	September 4, 2009
Where best to allocate memory On the local stack or in shared memory CUDA Programming and Performance	11	5423	January 26, 2009
Local variables and registers CUDA Programming and Performance	13	6156	March 23, 2010
How is memory type chosen for stack variable? CUDA Programming and Performance	5	6161	November 5, 2007
Accessing/caching access to global/pinned memory CUDA Programming and Performance	10	755	May 29, 2023
About the different memories CUDA Programming and Performance	12	11640	December 6, 2007
Global memory vs register storage How to force the compiler to use registers? CUDA Programming and Performance	6	4987	July 3, 2009
Question about variables inside a kernel CUDA Programming and Performance	5	2359	January 22, 2008
address evaluation threadIdx,blockDim treated as constants? CUDA Programming and Performance	17	15757	May 20, 2008
memory organization CUDA Programming and Performance	3	4335	March 10, 2008

Questions about memory address

Related topics