Thread Local variable

Hi all,

What will happen if a large size array declares as a local array of a threads

as the following code

#NO_OF_BLOCK 32
#NO_OF_THREAD 32

#define ARR_SIZE 800

global threadFn(…)
{
int tempArr[ARR_SIZE];
}

{
threadFn<<<NO_OF_BLOCK, NO_OF_THREAD>>>(…);
}

the thread specific local memory is too large. so how gpu will manage the memory or will it return any error?

Thread local arrays and other big structures will end up in local memory which is a slice of global memory dedicated to a thread. So it will be slow but it should work just like that. Your code allocates about 3MB total (for the entire grid) so there shouldn’t be a problem. If you launched more threads or had a bigger array so that the sum of local storage was greater than the card’s global memory size, I guess you’d get a kernel launch failure @ runtime (either “unspecified error” or “out of resources”) but I’ve never tested it.