[Beginner] [Absoloute] Device Memory Questions Double Pointers

Hello,

This is my first question in this forums, just beginning with CUDA C.

Given is the following code snippet:

//...

#define N 10

int main( void ) {

   int a[N], b[N], c[N];

   int *dev_a, *dev_b, *dev_c;

   // allocate the memory on the GPU

   HANDLE_ERROR( cudaMalloc( (void**)&dev_a, N * sizeof(int) ) );

   HANDLE_ERROR( cudaMalloc( (void**)&dev_b, N * sizeof(int) ) );

   HANDLE_ERROR( cudaMalloc( (void**)&dev_c, N * sizeof(int) ) );

// ...

The three pointers

int *dev_a, *dev_b, *dev_c;

are declared. This means all three are assigned a memory location, able to hold a pointer of the given type, here int.

This memory is allocated in host memory, presumably 12 bytes as most common integer types are 4 bytes.

Now,

HANDLE_ERROR( cudaMalloc( (void**)&dev_a, N * sizeof(int) ) );

allocates Memory on the device.

It uses two parameters. The second parameter is similiar to standard malloc(), determining the memory space to be allocated, aligned to 4 byte elements.

But the first parameter is a double pointer, holding the adress of one of the int pointers each declared before. Technically no question.

But what heck, is the adress of the host int pointer dev_a, dev_b, dev_c doing in device memory, as functions in device memory have absolutely no direct access to host memory, only through functions like cudaMemcpy for instance.

Furthermore, the kernel using this variables is declared as follow:

__global__ void add(int *a, int*b, int *c) //....

Clearly it shows, this are simple pointers, not double.

cudaMalloc() is defined as the following:

cudaError_t cudaMalloc (void ** devPtr, size_t size)

where the Parameters are:

devPtr 	- Pointer to allocated device memory

size 	- Requested allocation size in bytes

Definition taken from:

(http://developer.download.nvidia.com/compute/cuda/2_3/toolkit/docs/online/group__CUDART__MEMORY_gc63ffd93e344b939d6399199d8b12fef.html)

devPtr points to allocated device memory, but the parameter is a double pointer holding the adress of the pointer in host memory? Why that? Thanks in advance for any help on this.

Hello,

This is my first question in this forums, just beginning with CUDA C.

Given is the following code snippet:

//...

#define N 10

int main( void ) {

   int a[N], b[N], c[N];

   int *dev_a, *dev_b, *dev_c;

   // allocate the memory on the GPU

   HANDLE_ERROR( cudaMalloc( (void**)&dev_a, N * sizeof(int) ) );

   HANDLE_ERROR( cudaMalloc( (void**)&dev_b, N * sizeof(int) ) );

   HANDLE_ERROR( cudaMalloc( (void**)&dev_c, N * sizeof(int) ) );

// ...

The three pointers

int *dev_a, *dev_b, *dev_c;

are declared. This means all three are assigned a memory location, able to hold a pointer of the given type, here int.

This memory is allocated in host memory, presumably 12 bytes as most common integer types are 4 bytes.

Now,

HANDLE_ERROR( cudaMalloc( (void**)&dev_a, N * sizeof(int) ) );

allocates Memory on the device.

It uses two parameters. The second parameter is similiar to standard malloc(), determining the memory space to be allocated, aligned to 4 byte elements.

But the first parameter is a double pointer, holding the adress of one of the int pointers each declared before. Technically no question.

But what heck, is the adress of the host int pointer dev_a, dev_b, dev_c doing in device memory, as functions in device memory have absolutely no direct access to host memory, only through functions like cudaMemcpy for instance.

Furthermore, the kernel using this variables is declared as follow:

__global__ void add(int *a, int*b, int *c) //....

Clearly it shows, this are simple pointers, not double.

cudaMalloc() is defined as the following:

cudaError_t cudaMalloc (void ** devPtr, size_t size)

where the Parameters are:

devPtr 	- Pointer to allocated device memory

size 	- Requested allocation size in bytes

Definition taken from:

(http://developer.download.nvidia.com/compute/cuda/2_3/toolkit/docs/online/group__CUDART__MEMORY_gc63ffd93e344b939d6399199d8b12fef.html)

devPtr points to allocated device memory, but the parameter is a double pointer holding the adress of the pointer in host memory? Why that? Thanks in advance for any help on this.

The double pointer construct is used because C functions can only return a single value, which is already used for the error code. The standard technique to work around this is for any additional “return” values to pass a pointer into the function that indicates where in (host) memory the function should leave the additional value.
Since the value to be returned already is a (device) pointer, the argument now becomes a double pointer, which is a host pointer to a device pointer. The device pointer can then be passed as an argument to a kernel to actually use it on the device.

The double pointer construct is used because C functions can only return a single value, which is already used for the error code. The standard technique to work around this is for any additional “return” values to pass a pointer into the function that indicates where in (host) memory the function should leave the additional value.
Since the value to be returned already is a (device) pointer, the argument now becomes a double pointer, which is a host pointer to a device pointer. The device pointer can then be passed as an argument to a kernel to actually use it on the device.

Thank you very much for the answer.

Clear.

Not clear.

A Pointer is a memory location, containing an adress to another location.

A double pointer is a memory location, containg an adress to another location, which contains a pointer.

You say, dev_a (dev_b, dev_c) becomes a double pointer, residing in host memory, which contains an adress pointing to device memory, which again points to device memory allocated by cudaMalloc().

This means, you arguably have two different functional views on dev_a (dev_b, dev_c).

From the host view, there is still a location in host memory dev_a( dev_b, dev_c), pointing to device memory.

From the device view, dev_a (dev_b, dev_c) is a normal pointer in device memory, pointing to the allocated space in device memory.

The different context is determind by where dev_a( dev_b, dev_c) is used, within the kernel, or within host code.

But, what sense makes dev_a (dev_b, dev_c) on the host side, you can’t dereference it and access device memory directly like *dev_a++; (which would move the pointer to one element ahead)

Shouldnt the pointer declarations be within the kernel( or in a seperate part, different form host code) as they are never used within the scope of host code?

Thank you very much for the answer.

Clear.

Not clear.

A Pointer is a memory location, containing an adress to another location.

A double pointer is a memory location, containg an adress to another location, which contains a pointer.

You say, dev_a (dev_b, dev_c) becomes a double pointer, residing in host memory, which contains an adress pointing to device memory, which again points to device memory allocated by cudaMalloc().

This means, you arguably have two different functional views on dev_a (dev_b, dev_c).

From the host view, there is still a location in host memory dev_a( dev_b, dev_c), pointing to device memory.

From the device view, dev_a (dev_b, dev_c) is a normal pointer in device memory, pointing to the allocated space in device memory.

The different context is determind by where dev_a( dev_b, dev_c) is used, within the kernel, or within host code.

But, what sense makes dev_a (dev_b, dev_c) on the host side, you can’t dereference it and access device memory directly like *dev_a++; (which would move the pointer to one element ahead)

Shouldnt the pointer declarations be within the kernel( or in a seperate part, different form host code) as they are never used within the scope of host code?

I think the important reason for your confusion is where cudaMalloc is executed.

In fact, it is a -host- function executed on host only. What it does can be depicted like this:

cudaError cudaMalloc( void** ptr, size_t size ) {

  GPUDriver.callSomeInternalStuff()

  GPUDriver.IWantSomeDeviceMemory(size);

  *ptr=GPUDriver.WhereIsMyMemory();

  error=GPUDriver.checkErrorCodes();

  return error;

}

The code is somewhere in the CUDA library files you link with your code.

Note, I am not an NVIDIA developper, I don’t really know what’s inside cudaMalloc, what I have written is a result of pure imagination, but the function behaviour can be described like what I have written above.

The final result is that:

  • dev_a is a pointer on the host side, residing on the stack of your main function, exactly where you declared it

  • dev_a points to a piece of memory on the device which has been reserved for you by cudaMalloc

Now to your second problem:

Calling a kernel incurrs a copy of function parameters from host to device. So we take the value of dev_a pointer and copy it to device and make our kernel run. Since the value points to your piece of device memory, everything works fine.

I think the important reason for your confusion is where cudaMalloc is executed.

In fact, it is a -host- function executed on host only. What it does can be depicted like this:

cudaError cudaMalloc( void** ptr, size_t size ) {

  GPUDriver.callSomeInternalStuff()

  GPUDriver.IWantSomeDeviceMemory(size);

  *ptr=GPUDriver.WhereIsMyMemory();

  error=GPUDriver.checkErrorCodes();

  return error;

}

The code is somewhere in the CUDA library files you link with your code.

Note, I am not an NVIDIA developper, I don’t really know what’s inside cudaMalloc, what I have written is a result of pure imagination, but the function behaviour can be described like what I have written above.

The final result is that:

  • dev_a is a pointer on the host side, residing on the stack of your main function, exactly where you declared it

  • dev_a points to a piece of memory on the device which has been reserved for you by cudaMalloc

Now to your second problem:

Calling a kernel incurrs a copy of function parameters from host to device. So we take the value of dev_a pointer and copy it to device and make our kernel run. Since the value points to your piece of device memory, everything works fine.

A pointer itself is just an address in memory, so there is only one memory location involved.

If you store a pointer in a pointer variable, a second memory location (the pointer variable) becomes involved.

If [font=“Courier New”]i[/font] is a variable (declared by [font=“Courier New”]int i;[/font]), [font=“Courier New”]&i[/font] is the address, where the value of [font=“Courier New”]i[/font] is stored in memory. If you declare a pointer [font=“Courier New”]int *p;[/font] and let [font=“Courier New”]p = &i;[/font], the address of i gets stored to a second memory location.

So a pointer variable is a memory location, containing the adrdess of another location.

A double pointer is the address of a memory location, where a pointer to (i.e., the address of) another memory location is stored.

[font=“Courier New”]dev_a[/font], [font=“Courier New”]dev_b[/font], [font=“Courier New”]dev_c[/font] are not changed. They remain pointers. However, their address is taken, which itself is a pointer to a pointer.

Yes.

Indeed you cannot use (i.e., dereference) device pointers on the host. That is why they are passed as arguments to the kernel, where they can be used as normal pointers.

On a side node, [font=“Courier New”]dev_a++;[/font] moves the pointer one element forward, and would be allowed on the host side since variables on the host and device side are of the same size. However you would hardly ever use it (why would you move a pointer element by element if you can’t actually access them?). font=“Courier New”++;[/font] would increment the variable that [font=“Courier New”]dev_a[/font] points to.

The pointer declaration is different from a memory allocation.

Memory allocation is, at least on compute capability 1.x devices, always done on the host as the GPU does not run an operating system itself. I assume Nvidia engineers have gone through quite some hassle to make device side memory allocations possible on later GPUs.

A pointer itself is just an address in memory, so there is only one memory location involved.

If you store a pointer in a pointer variable, a second memory location (the pointer variable) becomes involved.

If [font=“Courier New”]i[/font] is a variable (declared by [font=“Courier New”]int i;[/font]), [font=“Courier New”]&i[/font] is the address, where the value of [font=“Courier New”]i[/font] is stored in memory. If you declare a pointer [font=“Courier New”]int *p;[/font] and let [font=“Courier New”]p = &i;[/font], the address of i gets stored to a second memory location.

So a pointer variable is a memory location, containing the adrdess of another location.

A double pointer is the address of a memory location, where a pointer to (i.e., the address of) another memory location is stored.

[font=“Courier New”]dev_a[/font], [font=“Courier New”]dev_b[/font], [font=“Courier New”]dev_c[/font] are not changed. They remain pointers. However, their address is taken, which itself is a pointer to a pointer.

Yes.

Indeed you cannot use (i.e., dereference) device pointers on the host. That is why they are passed as arguments to the kernel, where they can be used as normal pointers.

On a side node, [font=“Courier New”]dev_a++;[/font] moves the pointer one element forward, and would be allowed on the host side since variables on the host and device side are of the same size. However you would hardly ever use it (why would you move a pointer element by element if you can’t actually access them?). font=“Courier New”++;[/font] would increment the variable that [font=“Courier New”]dev_a[/font] points to.

The pointer declaration is different from a memory allocation.

Memory allocation is, at least on compute capability 1.x devices, always done on the host as the GPU does not run an operating system itself. I assume Nvidia engineers have gone through quite some hassle to make device side memory allocations possible on later GPUs.

The answers cleared all questions for me at my level of knowledge. Thank you very much.

The answers cleared all questions for me at my level of knowledge. Thank you very much.