question in execution model Section 3.2 in the guide

i have understand some of this section , but i didn’t know why we have warps ? and what they do … can u help me ?

another question relates to the execution configration:
i didn’t understand why should i use Ns varabile, is cuda will allocate that for me dynamicly or what ?

for example if i wrote:
extern shared test;

global void mytest()
{
int* testptr=test;
// using testptr
return ;
}


// in main
mytest<<<1,1,1>>>();

is that ok ?
on other word’s if i send the kernel Ns=1, or something else which != 0
what’s that means ?

In my view. There are 8 sp in one MP so, the threads can’t running at the same time.

extern is a dynamic way to allocate shared memory.

The Ns=1 means the each block threads can use 1 byte shared memory. You can find the define of Ns in 4.2.3.

you have 8 processing units in a multiprocessor. But a multiprocessor decodes instructions 4 times as slow as the processing units do their processing. Therefore each processing unit needs to run the same instruction 4 times in a row (on different data). 4 * 8 = 32.