After calling cuMemAlloc(), you’ve got a pointer to device memory, which is all you need. You can pass that pointer as an argument to a kernel, or you can write it to a global variable on the device for later use.
What are you trying to achieve by setting the symbol address?
the first solution you have mentioned would work, but I thought of a method where I don’t have to pass the parameters through the kernel all the time.
I simply want to reduce the load of settings the parameters with cuSetParam, therefore I also reduce the complexity of my code. Currently I have to call the kernel
about 4000 times having to set the kernel parameters all the time. Most of the parameter I can simply set once as global variables.
What I’m searching for is a method where I can set the address of a global variable in the host code but there is currently (as far as I know) no such method.
As you have mentioned, how can I write the resulting device pointer from cuMemAlloc to a variable on the device?
You cannot set the address of a variable on the device, because that would require some kind of linker to be run afterwards to fix up every access to that variable in device code.
The problem is that it DOES NOT work this way - when You use cuModuleGetGlobal() on a name of a device pointer, You get only THE POINTER address (and THE POINTER size in bytes) and do not allocate the pointer. Thus, when calling cuMemcpyHtoD() with data array of given length, say N, there will be CUDA_ERROR_INVALID_VALUE result returned by the second call. How the driver is supposed to know the amount of memory (here N times size of the type the pointer is pointing to, in bytes) needed at the call of cuMemcpyHtoD()? With no prior allocation You are tring to copy host data of given length to device data of fixed size (not N), which is the size of the pointer (at my computer it is 8 bytes). The question is how to allocate such pointer? Or how to map a allocated memory (by call cuMemAlloc()) to certain name listed in the kernel code as the device memory variables? I don’t want to use the kernel parameters, of course.
My hardware is GeForce 9600 GT. Please correct me if I got something wrong here.