performance

The central idea for good performance is that we have as more as possible warps simultaneously in each multiprocessor?