What is the settings (blocks per grid) & (threads per block) will be created when APIs of cublas or cudnn is called?

Hi:

Is there any references about “What is the settings (blocks per grid) & (threads per block) will be created when APIs of cublas or cudnn is called?”

I mean for example:
when cublasGemmEx is called, what is (blocks per grid) & (threads per block) will be set?

Moreover, if:

  1. multiple streams are created and each of them call, ex cublasGemmEx()

  2. multiple threads are launched & each of them call, ex cublasGemmEx()

  3. multiple processes are launched & each of them call, ex cublasGemmEx()

For each case above, is there any difference about the computational resources allocation policy?

Thanks~

To some degree these will vary by problem size.

There is no documentation, and any anything you discover may change from one CUDA version to the next, or even if running on a different GPU type.

You can discover the blocks and threads for any kernel call using a profiler.

Hi Robert!

Thanks for this insight! Can we expect this behavior from CUB APIs as well? For instance, using NVIDIA Nsight, I’m able to find the values of the block size (#threads per block) and grid size (#blocks per grid) of the CUB APIs launched. However, I’m curious about the criteria used in determining these parameters. I’m also interested to know if there’s a means for us to actively manage these parameters? I would greatly appreciate your guidance on this matter. Thank you so much!

When using CUB directly, you specify the grid dimensions for warp and block-level primitives. For device-wide primitives, you do not. When CUB gets called as part of thrust, you also do not.

You generally cannot actively manage the grid dimensions when there is an intervening library, like cublas or cusolver or cufft or thrust.

I doubt that grid dimension sizing heuristics for these libraries are in any way out of the ordinary when compared to conventional wisdom about grid sizing. However such internal details are generally not documented to my knowledge. Both CUB and Thrust are open source, so you can inspect those yourself.

This helps tons, thank you so much for the clarifications!