How can Iget the pointer to the device memory var

ASD8907 · October 29, 2007, 11:33am

How can I get the pointer to the device memory variable from the host function?

If I trace the following code, it works correctly, but if I start exe file (EmulDebug), I’ve got an error. Why?

__device__ unsigned int tmpp;

__global__ void fun_tmpp(unsigned int** tmp)

{

    int bx = blockIdx.x;

   int tx = threadIdx.x;

   if(bx==0 && tx==0)

    {

        *tmp = &tmpp;

    }

     __syncthreads();

};

extern "C" void runTest()

{

    unsigned int* tm;

   unsigned int tm1 = 90;

   fun_tmpp<<<1,1>>>(&tm);

//!!!BUG

    CUDA_SAFE_CALL(cudaMemcpy(tm, &tm1, sizeof(unsigned int),

                              cudaMemcpyHostToDevice) );

}

Thanks in advance.

prkipfer · October 29, 2007, 12:17pm

You cannot do that. By the time the kernel fun_tmpp has ended, the device variable tmpp is not there anymore. If you need permanent device memory, allocate it with cudaMalloc which writes the device mem pointer to a host variable you provide via a host mem pointer. Then memcpy to it.

Peter

wumpus · October 29, 2007, 2:02pm

That’s quite a weird way to get the address of a device variable, but I don’t see how it neccesarily wouldn’t work.

AFAIK, you generally use ‘cudaMemCpyToSymbol’ for this.

Prkipfer: He declares the device globally, not locally, so it should remain existant between kernel calls, right?

Quite interesting, if you do this, you get an relocation entry in the cubin file

reloc  {

        name = xx

        segname = reloc

        segnum = 14

        offset = 0

        bytes = 4

}

This is how it gives the symbol a (fixed) place in global memory and communicates the location to the kernel.

ASD8907 · October 29, 2007, 3:08pm

Ok. So in my example address of “device unsigned int tmpp” is valid only for device, global functions, isn’t it?

But here I allocate memory with cudaMalloc():

__device__ unsigned int* tmpp;

__global__ void fun_tmpp(unsigned int** tmp)

{

    int bx = blockIdx.x;

   int tx = threadIdx.x;

if(bx==0 && tx==0)

{

        *tmp = tmpp;

}

     __syncthreads();

};

__global__ void setfun_tmpp(unsigned int** tmp)

{

    int bx = blockIdx.x;

   int tx = threadIdx.x;

if(bx==0 && tx==0)

{

        tmpp = *tmp;

}

     __syncthreads();

};

extern "C" void runTest()

{

    unsigned int* tm;

    unsigned int tm1 = 90;

   CUDA_SAFE_CALL(cudaMalloc((void**)&tm, sizeof(unsigned int)));

   setfun_tmpp<<<1,1>>>(&tm);

   fun_tmpp<<<1,1>>>(&tm);

   CUDA_SAFE_CALL(cudaMemcpy(tm, &tm1, sizeof(unsigned int),

                              cudaMemcpyHostToDevice) );

};

And it slill doesn’t work. :(

prkipfer · October 29, 2007, 4:04pm

The setfun_tmpp overwrites the device mem pointer you got from cudaMalloc with the (uninitialized) pointer value of device tmpp. So the following kernel can only restore a crap pointer value and you have a memory leak of the cudaMalloc’ed mem.

@wumpus, yeah, the cleanup removes all used variables. You cannot use a global var declaration to transport data between kernels. You need to use cudaMalloc to get persistent storage.

Peter

asadafag · October 30, 2007, 4:22am

The trick is: You can’t use & to get the address of a device variable.
nvcc compiles a device memory into a device variable and a dummy host variable of the same type. Whenever you refer to it in host code, you refer to the dummy host variable (whose sole purpose may be getting correct sizeof). That’s why we have to use cudaMemCpyToSymbol: the true address has to be deduced from the dummy host variable’s address, not just a &.
You can get the correct pointer via cudaGetSymbolAddress(), or just convert to driver API like I did.

ASD8907 · October 30, 2007, 9:23am

Thanks, it works!

prkipfer · October 30, 2007, 10:56am

So you are saying that global device variables stay around after the kernel invocation? Last time I checked this, it didn’t work (that was CUDA 0.8 though). I only use it for constant stuff. Maybe it works now with r/w variables. asadafag, can you confirm that?

Peter

asadafag · October 30, 2007, 2:07pm

I used device var once in my kernel and it worked (in 1.0).

Just verified it.

#include <stdio.h>

__device__ int aaaa;

int main(){

	void *p=NULL;

	cudaGetSymbolAddress(&p,aaaa);

	printf("%p\n",p);

	return 0;

}

hakuna · October 31, 2007, 2:31am

I used device var once in my kernel and it worked (in 1.0).

Just verified it.
#include <stdio.h>

__device__ int aaaa;

int main(){

	void *p=NULL;

	cudaGetSymbolAddress(&p,aaaa);

	printf("%p\n",p);

	return 0;

}
[snapback]272316[/snapback]

I tried your example code, but it’s strange that, sometimes the value of printf is 0. and I think we should not print the address directly, because the pointer p is a device pointer, and we should first copy it to host and then printf it? am I right?