This may be an old old topic for you. But it is still not very clear to me. We have multiprocessors in CUDA-enabled GPUs. Are all these multiprocessors run concurrently, or scheduled? I am aware that only 768 threads max can be run in one multiprocessor. So if I have more than 768 threads, then they will be split into several multiprocessors? Is it guaranteed that 768 threads can be run concurrently if I have assigned 768 threads for a kernel? Let's say I limit the threads for one kernel within 768. Is there a way that I can run this kernel on several multiprocessors simultaneously? If so, how should I specify the multiprocessor for a kernel? Otherwise, is there a way to do so? Thank you so much!!!!