How to force all thread blocks to be alive at the same time?

I have 42 thread blocks and I want to be sure that they are working at the same time. How can I force them to be live. If one them has delay then maybe is cause deadlock.

one idea could be force them to run dedicatted SM, but here I am thinkning that we are wasting resurces, and even I dont know how to keep them on dedicated SM.

There is no way to “force” this in the general case. For example, if you happen to run your code on a GPU with only 1 SM, then there is not a way to have all 42 thread blocks be resident and active. Other factors that you haven’t specified, such as desired threads per block, may also preclude your goal on other GPUs.

Since we have now dispensed with the general case, this becomes a matter of occupancy. You might want to learn about:

  1. What occupancy is
  2. How to use the occupancy calculator spreadsheet
  3. How to use the occupancy API - a good example is given in the programming guide under the cooperative groups section

Occupancy is discussed in many places, google will help you locate more references.

Your goal would be to use the tools (static - occupancy calculator spreadsheet, dynamic - occupancy API) to construct a kernel launch that does not exceed 100% occupancy.

The function cudaLaunchCooperativeKernel can be used to guarantee that thread blocks are co-resident on the GPU. The documentation lists the requirements for gridDim and blockDim.