Fixed size array of pointers

I have question concerning passing arguments to cuda kernels.

Suppose I have

a → points to cuda allocated memory
b → points to cuda allocated memory

then start a kernel

foo<<<…>>>(a, b); // with device foo(float *a, float *b);

Why can’t cuda do the following

*c[2];

c[0]–> points to cuda allocated memory
c[1]–> points to cuda allocated memory

then

foo2<<<…>>>(c);

where the signature for foo2 is
device foo2(float *c[2]);

In other words it is known at compile time that c[0] and c[1] are returning addresses in cuda memory.

I understand the compiler to be unable to handle foo3(float **a); But foo2 above is very similar to the original foo… I mean this should just tell the compiler to pick the two cuda addresses c[0] and c[1] at kernel launch …

I have done what you wanted, but it involves wrapping the fixed size array in a struct

class Context
{
float *pointers[3][3];
};

and then passing the struct by value. But beware, if you access pointers in your device code with a variable instead of a compile time constant, it causes the entire struct to be duplicated in local memory, slowing it down. Probably not a big deal, but if you’re trying to extract maximum performance like me, then it will.

I’ve asked why NVCC wants to duplicate the parameters in local memory instead of accessing it from constant memory where it originally resides (even when the struct is declared as read-only const), and someone said it has to do with the new ABI requirements?