Hi,
blockIdx and threadIdx are variables that are initialized automatically by cuda with the blockindex and threadindex.
So if you call your kernel with:
mykernel<<<16,64>>>(someargs);
you have blockIdx.x from 0 to 15.
Every block now has threads from 0 to 63.
Based on the indices you can schedule and control the work of the single threads.
So the lines of code in your kernel are executed by each and every thread in parallel.
That and more is covered pretty good in the programming guide!