Global function Parameter size


I just read about the limitation of global function parameter size to 256 byte on page 29 of the cuda programming guide.

what does that actually include?

I’ll giva an example:

__global__ void PointtIteration(int* pic, struct wbet _wbet,int max_iter)

int* pic is an array which has been allocated on the device by using the cudaMalloc function. -> its way bigger than 256 byte, so i estimate the limitation doesn’t affect the global device memory

which parameters are limited?

lets imagine, that the structure “wbet” contains an array of more than 256 bytes. will that work?

in case not, lets imagine, that i use 2 of these structures which have …lets say 192 bytes each. works?

im sick of getting a sprinkled screen, so i’ll ask before i try :D


Pointers are 4 bytes, not the size of the memory space which the pointer points to. The pointer doesn’t actually contain any information about the size of the memory space.

With the structure - if it contains pointers to device memory that’s fine. If it cotains 256 bytes of data then that’s not.

Let me get this straight. This means you can do:

global functionname (float *a1, float *a2, float *a3, … , float *an) for n=large number?

Or does each pointer count for some portion (4 bytes?) of the 256 byte data?

If you needed to pass a lot of arguments, and they’re all of one datatype, would it be better/same/worse to create offsets that are stored in the symbols and just store all data in an allocated space. AKA, managing the indexing yourself.


Each pointer would count for 4 bytes (or 8 bytes on a 64-bit target). If you want to pass in more parameters than that, just create a struct with your parameters, manually memcpy it to the GPU, and then pass a pointer to that.

thats the way i used to solve it now.

it also has the advantage, that u can put all parameters in a structure, copy that structure to device memory and back, with a minimum amount of allocations ect., and u keep the overview over them. I had to migrate an old c-written software to cuda (partly), and needed to transfer ~30 parameters to the cuda function

This would mean the pointers are now stored in global memory, right? Which means you have to do a global memory call everytime you want find the address, instead of pulling it from a register? Or would it be cached? Could you instead copy that struct into a symbol and access it like that? That way you wouldn’t even have to pass a pointer to the struct.


What do you mean by “copying the sturct into a symbol”??