A question the parallelization

Hi all,

This may be an old old topic for you. But it is still not very clear to me.

We have multiprocessors in CUDA-enabled GPUs. Are all these multiprocessors run concurrently, or scheduled? I am aware that only 768 threads max can be run in one multiprocessor. So if I have more than 768 threads, then they will be split into several multiprocessors? Is it guaranteed that 768 threads can be run concurrently if I have assigned 768 threads for a kernel?

Let's say I limit the threads for one kernel within 768. Is there a way that I can run this kernel on several multiprocessors simultaneously? If so, how should I specify the multiprocessor for a kernel? Otherwise, is there a way to do so?

Thank you so much!!!!

really it is not necessary to post your question in more than 1 forum. I answered in the other.