CUDA threads and warps

Hello all ,

I am new to GPU programming . i have some questions regarding basic concepts cuda and gpu hardware

If we assign threads , in runtime these threads are again divided into warps . I am quite confused about how gpu execute instructions . Is that the thread executes one instruction or a group of threads(warps ) execute one instruction ?? And how cuda cores involve with the process when executing an instructions.

And i am using Jetson TK1 , i have read that it has only one SM. so how many blocks does that SM have?

Thank you