Blocks with varying thread size?

hey all,

My code would best be parallelized if I set up M blocks, with N(M) threads per block. i.e., each block can have its own number of threads. Is this possible?

My other option I suppose would be to make all blocks have the same number of threads as that with the maximal number, and essentially make the extra ones idle (i.e., do nothing for me).

Thanks!

Yes, you can launch blocks with the maximum number of threads and have some threads finish early, provided that you do not use __syncthreads() in the kernel. If you do, you need to make sure that the unneeded threads still follow a codepath where they encounter the same __syncthreads() as the working threads. So you might have something similar to

active = (threadIdx.x < number_of_active_threads);

    while (work_to_do) {

        if (active) {

            do_some_work();

        }

        __syncthreads();

        if (active) {

            do_some_more_work();

        }

        __syncthreads();

    }

Of course this will be somewhat inefficient if some of the remaining warps are not fully occupied, just as in the case of a static block size that is not a multiple of 32.