I just create one block with 12 threads (my_kernel<<<1 , 12>>>) But when I see “active thread per multiprocessor” , I see the number of active thread is 255! What is the difference between these two terminology? Is the same as the difference between the hardware thread and software thread?