I’m trying to build very specific kernel - it uses no parameters (they are passed via constant mem) and should return
only one int value, but it MUST use all available shared memory -16K
In C, I would write it like
int A (void)
{…
return (…);
}
I can write it on CUDA like
global void A (int * value)
{…
*value = …;
}
but this way I can’t use ALL shared memory - the kernel parameter also uses it.
So, is there any way to return just one int value from kernel to host without using the kernel
parameters? Maybe, the pinned memory can help? Or there is a simpler solution?
If you return several values from multiple threads you might want to use an array.
Check the Reference Manual on how to get values from global device variables to the host. You need cudaCopyFromSymbol or something like that.
Nevertheless you can not use all shared memory. There are always 16 bytes reserved for some internal stuff, like block and grid dimentions. You could try to get around it, some people tried (search the forum), but it is an ugly undocumented hack which may or may not work…