How to use device pointer on host side(e.g. cudaMalloc)?

mybiandou · February 2, 2009, 7:58am

How to use device pointer on host side(e.g. cudaMalloc)?

I first define a device pointer and then want to allocate memory for it. But it failed.

It seems that I can not use the device point directly on host side just like on device side.

But I don’t know how to realize this. (Allocating memory for a device pointer on host side)

Someone suggest me this way define pointer first on host side as the following

float *Ad;

int main(void)

{

cudaMalloc((void**)&Ad,size);

return 0;

}

But I don’t like this way because I have to pass lots of pointer arguments to gpu side functions.

I prefer to define all of the variables in global scope and use them directly in my gpu side functions.

Anyone can help me? Thank you in advance.

The following is the simplest code:

[codebox]

#include “stdio.h”

device int* Ad;

int *Ah;

global void test_kernel(void)

{

const int tid=threadIdx.x;

Ad[tid]=tid;

return;

}

int main(void)

{

int size=10;

Ah=(int*)malloc(size*sizeof(int));

cudaMalloc(Ad,size*sizeof(int));

test_kernel<<<1,size>>>();

cudaMemcpyFromSymbol(Ah,“Ad”,size*sizeof(int));

for(int i=0;i<size;i++)

  printf("%d %d \n",i,Ah[i]);

free(Ah);

cudaFree(Ad);

return 0;

}

[/codebox]

I also tryied using a temporary pointer on host side and then copy the value to device pointer , but also failed.

In theory, it should work. But …

The following is the source code.

[codebox]

#include “stdio.h”

device int* Ad;

int *Ah;

global void test_kernel(void)

{

const int tid=threadIdx.x;

Ad[tid]=tid;

return;

}

int main(void)

{

int size=10;

int *tmp;

Ah=(int*)malloc(size*sizeof(int));

if(cudaMalloc((void**)&tmp,size*sizeof(int))!=cudaSuccess)

{

printf("cuda malloc error\n");

return 1;

}

if(cudaMemcpyToSymbol(“Ad”,tmp,sizeof(void*))!=cudaSucess)

{

printf("cuda mem copy error\n");

return 1;

};

test_kernel<<<1,size>>>();

cudaMemcpyFromSymbol(Ah,“Ad”,size*sizeof(int));

for(int i=0;i<size;i++)

  printf("%d %d \n",i,Ah[i]);

free(Ah);

cudaFree(Ad);

return 0;

}

[/codebox]

mybiandou · February 2, 2009, 8:34am

Now I have solved the problem by using the second method of temporary host pointer,

But it is a little tedious, is there any better method to use the device pointer more directly.

The following is my modified code

[codebox]

#include “stdio.h”

device int* Ad;

int *Ah;

global void test_kernel(void)

{

const int tid=threadIdx.x;

Ad[tid]=tid;

return;

}

int main(void)

{

int size=100;

int *tmp;

Ah=(int*)malloc(size*sizeof(int));

if(cudaMalloc((void**)&tmp,size*sizeof(int))!=cudaSuccess)

{

printf("cuda malloc error\n");

return 1;

}

if(cudaMemcpyToSymbol(“Ad”,&tmp,sizeof(void*))!=cudaSuccess)

{

printf("cuda mem copy error\n");

return 1;

};

test_kernel<<<1,size>>>();

cudaMemcpyFromSymbol(tmp,“Ad”,sizeof(void*));

cudaMemcpy(Ah,tmp,size*sizeof(int),cudaMemcpyDeviceToHost);

for(int i=0;i<size;i++)

  printf("%d %d \n",i,Ah[i]);

free(Ah);

cudaMemcpyFromSymbol(tmp,“Ad”,sizeof(void*));

cudaFree(tmp);

return 0;

}

[/codebox]

MisterAnderson42 · February 2, 2009, 12:02pm

What you’ve done with cudaMemcpyToSymbol is the only way to initialize such a device variable.

Note that storing a pointer in device memory is a really bad idea. Every thread is going to have to pull that pointer out of slow device memory => uncoalesced memory reads => slow kernel. 1) You will find much better speed using constant memory (vs. device memory). 2) You will find much better convenience (and much more general purpose kernels) if you just allocate pointers on the host and pass them as arguments to the kernel. Oh, and there will be no speed penalty vs storing the pointer in constant memory.

mybiandou · February 3, 2009, 4:00am

Thank you for your good advice and clear explaination.

I will try the method.

Topic		Replies	Views
Copy data to device memory CUDA Programming and Performance	7	646	March 4, 2024
How can Iget the pointer to the device memory var CUDA Programming and Performance	9	4875	October 31, 2007
Copying Data from host to Device and Back CUDA Programming and Performance	5	1524	August 14, 2015
Simple Basic question, please answer. CUDA Programming and Performance	4	1511	March 26, 2009
Iterating pointers on the host allocated with cudaMalloc() CUDA Programming and Performance	1	655	July 1, 2017
The most basic problem,ask for help CUDA Programming and Performance	5	2151	February 2, 2009
Accessing GPU global memory allocated on device - by host CUDA Programming and Performance	3	1250	June 3, 2013
Get device pointer without using pinned memory? CUDA Programming and Performance	3	703	October 12, 2021
pointer in global device memory CUDA Programming and Performance	9	11732	November 23, 2011
Allocating memory to __device__ variables CUDA Programming and Performance	1	2357	July 10, 2008

How to use __device__ pointer on host side(e.g. cudaMalloc)?

Related topics

How to use device pointer on host side(e.g. cudaMalloc)?