I had programmed some simple CUDA program. But until now I cannot understand clearly about the thread. :(
Block is sliced into many warp, and warp is sliced into 2 halfwarp ( 16 threads per halfthread can run concurrently).
When program call kernel function, first 16 threads of the first halfwarp will active to handle the kernel function. And these threads will stop when all command inside the kernel function has been done.
After the first halfwarp finished next halfwarp will be active to handle the kernel function.
Although there is no real question in your topic, perhaps the following helps:
Every thread executes the code you give him and you can schedule what to to per thread by the threadIdx and by the blockIdx for all threads in a block.
The threads are now arranged into warps. However these warps are independently scheduled (unless you use syncthreads()) by the GPU and may be executed in order or out of any order. To hide latency the warps are executed till they load/store something and the next warp is getting its turn and so forth. Finally all threads reached the end of the program and the kernel is finished.
When program call kernel function, first 16 threads of the first halfwarp will active to handle the kernel function?
Yes
And these threads will stop when all command inside the kernel function has been done.
Yes and No.
They will pause as the scheduler pauses them for e.g. load store latency hiding and run another warp. Yes they will really stop when the kernel code is finished.
After the first halfwarp finished next halfwarp will be active to handle the kernel function.
Yes but with unpredictable pause and resuming.