Set the address of a symbol in Driver API


maybe this question has been asked already before, but I couldn’t find any entry.

I’m working in C# with the driver API. For example I have following code:

__global__ float *A = NULL;

void someFunc (float Theta)


// Some access to A


In C#, I need to get the symbol from the module via cuModuleGetGlobal() method where I gain access to the device pointer of A.

The problem is that I don’t know at design time the size of the array, so what I would need is a something like this:

CUdeviceptr *devptr;

cuMemAlloc(devptr, 100 /* Example */);

cuSetGlobalAddress("A", devptr); /* A method to set the address of the symbol */

Of course I could do this, but for performance reasons I would not prefer this option, since the parameter list is much longer than necessary.

Is there a way to set the address of the symbol programatically?



After calling cuMemAlloc(), you’ve got a pointer to device memory, which is all you need. You can pass that pointer as an argument to a kernel, or you can write it to a global variable on the device for later use.

What are you trying to achieve by setting the symbol address?

Hello tera,

the first solution you have mentioned would work, but I thought of a method where I don’t have to pass the parameters through the kernel all the time.

I simply want to reduce the load of settings the parameters with cuSetParam, therefore I also reduce the complexity of my code. Currently I have to call the kernel

about 4000 times having to set the kernel parameters all the time. Most of the parameter I can simply set once as global variables.

What I’m searching for is a method where I can set the address of a global variable in the host code but there is currently (as far as I know) no such method.

As you have mentioned, how can I write the resulting device pointer from cuMemAlloc to a variable on the device?

Do you have an idea for this?



Just declare a pointer variable on the device, and copy the device pointer into it that you got from cudaMemAlloc() on the


float *data;

	__device__ float *device_data;

	cudaMemAlloc(&data, 1234*sizeof(float));

	cudaMemcpyToSymbol(device_data, &data, sizeof(device_data), 0, cudaMemcpyHostToDevice);

You cannot set the address of a variable on the device, because that would require some kind of linker to be run afterwards to fix up every access to that variable in device code.

Hello Tera,

great. This is the method I was looking for. I completely forgot that this method exists.

Thanks for your help.

Can somebody tell me what the analogon to the cudaMemcpyToSymbol method is available in the driver API?



In the driver API you get the symbol address with cuModuleGetGlobal() and then do a normal cuMemcpyHtoD().

The problem is that it DOES NOT work this way - when You use cuModuleGetGlobal() on a name of a device pointer, You get only THE POINTER address (and THE POINTER size in bytes) and do not allocate the pointer. Thus, when calling cuMemcpyHtoD() with data array of given length, say N, there will be CUDA_ERROR_INVALID_VALUE result returned by the second call. How the driver is supposed to know the amount of memory (here N times size of the type the pointer is pointing to, in bytes) needed at the call of cuMemcpyHtoD()? With no prior allocation You are tring to copy host data of given length to device data of fixed size (not N), which is the size of the pointer (at my computer it is 8 bytes). The question is how to allocate such pointer? Or how to map a allocated memory (by call cuMemAlloc()) to certain name listed in the kernel code as the device memory variables? I don’t want to use the kernel parameters, of course.

My hardware is GeForce 9600 GT. Please correct me if I got something wrong here.