i have understand some of this section , but i didn’t know why we have warps ? and what they do … can u help me ?
another question relates to the execution configration:
i didn’t understand why should i use Ns varabile, is cuda will allocate that for me dynamicly or what ?
for example if i wrote:
extern shared test;
global void mytest()
{
int* testptr=test;
// using testptr
return ;
}
…
// in main
mytest<<<1,1,1>>>();
is that ok ?
on other word’s if i send the kernel Ns=1, or something else which != 0
what’s that means ?
you have 8 processing units in a multiprocessor. But a multiprocessor decodes instructions 4 times as slow as the processing units do their processing. Therefore each processing unit needs to run the same instruction 4 times in a row (on different data). 4 * 8 = 32.