Hello,
This is my first question in this forums, just beginning with CUDA C.
Given is the following code snippet:
//...
#define N 10
int main( void ) {
int a[N], b[N], c[N];
int *dev_a, *dev_b, *dev_c;
// allocate the memory on the GPU
HANDLE_ERROR( cudaMalloc( (void**)&dev_a, N * sizeof(int) ) );
HANDLE_ERROR( cudaMalloc( (void**)&dev_b, N * sizeof(int) ) );
HANDLE_ERROR( cudaMalloc( (void**)&dev_c, N * sizeof(int) ) );
// ...
The three pointers
int *dev_a, *dev_b, *dev_c;
are declared. This means all three are assigned a memory location, able to hold a pointer of the given type, here int.
This memory is allocated in host memory, presumably 12 bytes as most common integer types are 4 bytes.
Now,
HANDLE_ERROR( cudaMalloc( (void**)&dev_a, N * sizeof(int) ) );
allocates Memory on the device.
It uses two parameters. The second parameter is similiar to standard malloc(), determining the memory space to be allocated, aligned to 4 byte elements.
But the first parameter is a double pointer, holding the adress of one of the int pointers each declared before. Technically no question.
But what heck, is the adress of the host int pointer dev_a, dev_b, dev_c doing in device memory, as functions in device memory have absolutely no direct access to host memory, only through functions like cudaMemcpy for instance.
Furthermore, the kernel using this variables is declared as follow:
__global__ void add(int *a, int*b, int *c) //....
Clearly it shows, this are simple pointers, not double.
cudaMalloc() is defined as the following:
cudaError_t cudaMalloc (void ** devPtr, size_t size)
where the Parameters are:
devPtr - Pointer to allocated device memory
size - Requested allocation size in bytes
Definition taken from:
devPtr points to allocated device memory, but the parameter is a double pointer holding the adress of the pointer in host memory? Why that? Thanks in advance for any help on this.