Query on 64-bit Integer Support for dim3 Parameters in CUDA

I am working on a CUDA project and encountered an issue when using dim3 to configure kernel launch parameters. My code looks like this:

cpp

CopyEdit

dim3 grid(kernelConfig.GridSize(width), batchSize);  

In this context:

  • kernelConfig.GridSize(width) and batchSize are now 64-bit integers (int64_t).
  • Previously, these were 32-bit integers (int), and the code compiled and worked as expected.

However, after updating these variables to 64-bit integers, I am encountering a compilation error stating that dim3 expects int parameters.

My Questions:

  1. Does dim3 in CUDA support 64-bit integers for grid and block dimensions?
  2. If not, what is the recommended way to handle scenarios where grid or block dimensions might exceed the range of a 32-bit integer?
  3. Are there any plans to support 64-bit integers for dim3 parameters in future CUDA releases?
  • CUDA version: 11.0

Any guidance or recommendations would be greatly appreciated.

Thank you for your time and support!

Best regards,
Apoorva

Personally, I would use a strided for-loop in the kernel where each thread processes multiple elements instead of 1.
There is also the possibility to launch multiple kernels instead of 1.

For the grid dimension: You can separate a large number of blocks into 32 bits for the x dimension and 16 bits each for the y and z dimension.

For the block dimension: The number of threads per block has to be a number << 64 bits for current SM architectures. Not sure, how fast Nvidia will catch up. But to give a hint, the maximum number of threads increased from 512 (early Cuda versions) to 1024. So it doubled (exponential increase), but slowly.

I would not expect any future expansions to the maximum grid dimensions supported right now. The runtime of a kernel launched with a maximally-dimensioned grid using current limits will exceed the physical life of the GPU, and with a tiny bit of math a multi-dimensional grid can be effectively re-shaped into a virtual single-dimensional one with almost 263 blocks (as Curefab already pointed out).

That is a bit puzzling. Use of int64_t seems to compile fine.

CUDA does have hardware-imposed limits on grid and block dimensions. These are documented and can be immediately retrieved with the deviceQuery sample code.

Yes, and I would say the godbolt link I provided proves it.

There are no such dimensions that exceed the range of positive numbers available in int type in C++.
If you have a number larger than that range, it would be illegal to use it in grid or block dimensions in CUDA, currently.