How to use __device__ pointer on host side(e.g. cudaMalloc)?

How to use device pointer on host side(e.g. cudaMalloc)?

I first define a device pointer and then want to allocate memory for it. But it failed.

It seems that I can not use the device point directly on host side just like on device side.

But I don’t know how to realize this. (Allocating memory for a device pointer on host side)

Someone suggest me this way define pointer first on host side as the following

float *Ad;

int main(void)

{

cudaMalloc((void**)&Ad,size);

return 0;

}

But I don’t like this way because I have to pass lots of pointer arguments to gpu side functions.

I prefer to define all of the variables in global scope and use them directly in my gpu side functions.

Anyone can help me? Thank you in advance.

The following is the simplest code:

[codebox]

#include “stdio.h”

device int* Ad;

int *Ah;

global void test_kernel(void)

{

const int tid=threadIdx.x;

Ad[tid]=tid;

return;

}

int main(void)

{

int size=10;

Ah=(int*)malloc(size*sizeof(int));

cudaMalloc(Ad,size*sizeof(int));

test_kernel<<<1,size>>>();

cudaMemcpyFromSymbol(Ah,“Ad”,size*sizeof(int));

for(int i=0;i<size;i++)

  printf("%d %d \n",i,Ah[i]);

free(Ah);

cudaFree(Ad);

return 0;

}

[/codebox]

I also tryied using a temporary pointer on host side and then copy the value to device pointer , but also failed.

In theory, it should work. But …

The following is the source code.

[codebox]

#include “stdio.h”

device int* Ad;

int *Ah;

global void test_kernel(void)

{

const int tid=threadIdx.x;

Ad[tid]=tid;

return;

}

int main(void)

{

int size=10;

int *tmp;

Ah=(int*)malloc(size*sizeof(int));

if(cudaMalloc((void**)&tmp,size*sizeof(int))!=cudaSuccess)

{

printf("cuda malloc error\n");

return 1;

}

if(cudaMemcpyToSymbol(“Ad”,tmp,sizeof(void*))!=cudaSucess)

{

printf("cuda mem copy error\n");

return 1;  

};

test_kernel<<<1,size>>>();

cudaMemcpyFromSymbol(Ah,“Ad”,size*sizeof(int));

for(int i=0;i<size;i++)

  printf("%d %d \n",i,Ah[i]);

free(Ah);

cudaFree(Ad);

return 0;

}

[/codebox]

Now I have solved the problem by using the second method of temporary host pointer,

But it is a little tedious, is there any better method to use the device pointer more directly.

The following is my modified code

[codebox]

#include “stdio.h”

device int* Ad;

int *Ah;

global void test_kernel(void)

{

const int tid=threadIdx.x;

Ad[tid]=tid;

return;

}

int main(void)

{

int size=100;

int *tmp;

Ah=(int*)malloc(size*sizeof(int));

if(cudaMalloc((void**)&tmp,size*sizeof(int))!=cudaSuccess)

{

printf("cuda malloc error\n");

return 1;

}

if(cudaMemcpyToSymbol(“Ad”,&tmp,sizeof(void*))!=cudaSuccess)

{

printf("cuda mem copy error\n");

return 1;  

};

test_kernel<<<1,size>>>();

cudaMemcpyFromSymbol(tmp,“Ad”,sizeof(void*));

cudaMemcpy(Ah,tmp,size*sizeof(int),cudaMemcpyDeviceToHost);

for(int i=0;i<size;i++)

  printf("%d %d \n",i,Ah[i]);

free(Ah);

cudaMemcpyFromSymbol(tmp,“Ad”,sizeof(void*));

cudaFree(tmp);

return 0;

}

[/codebox]

What you’ve done with cudaMemcpyToSymbol is the only way to initialize such a device variable.

Note that storing a pointer in device memory is a really bad idea. Every thread is going to have to pull that pointer out of slow device memory => uncoalesced memory reads => slow kernel. 1) You will find much better speed using constant memory (vs. device memory). 2) You will find much better convenience (and much more general purpose kernels) if you just allocate pointers on the host and pass them as arguments to the kernel. Oh, and there will be no speed penalty vs storing the pointer in constant memory.

Thank you for your good advice and clear explaination.

I will try the method.