Query on 64-bit Integer Support for dim3 Parameters in CUDA

apoorvadshetty16 · January 17, 2025, 8:48am

I am working on a CUDA project and encountered an issue when using dim3 to configure kernel launch parameters. My code looks like this:

cpp

CopyEdit

dim3 grid(kernelConfig.GridSize(width), batchSize);

In this context:

kernelConfig.GridSize(width) and batchSize are now 64-bit integers (int64_t).
Previously, these were 32-bit integers (int), and the code compiled and worked as expected.

However, after updating these variables to 64-bit integers, I am encountering a compilation error stating that dim3 expects int parameters.

My Questions:

Does dim3 in CUDA support 64-bit integers for grid and block dimensions?
If not, what is the recommended way to handle scenarios where grid or block dimensions might exceed the range of a 32-bit integer?
Are there any plans to support 64-bit integers for dim3 parameters in future CUDA releases?

CUDA version: 11.0

Any guidance or recommendations would be greatly appreciated.

Thank you for your time and support!

Best regards,
Apoorva

striker159 · January 17, 2025, 9:33am

Personally, I would use a strided for-loop in the kernel where each thread processes multiple elements instead of 1.
There is also the possibility to launch multiple kernels instead of 1.

Curefab · January 17, 2025, 10:43am

For the grid dimension: You can separate a large number of blocks into 32 bits for the x dimension and 16 bits each for the y and z dimension.

For the block dimension: The number of threads per block has to be a number << 64 bits for current SM architectures. Not sure, how fast Nvidia will catch up. But to give a hint, the maximum number of threads increased from 512 (early Cuda versions) to 1024. So it doubled (exponential increase), but slowly.

njuffa · January 17, 2025, 1:52pm

I would not expect any future expansions to the maximum grid dimensions supported right now. The runtime of a kernel launched with a maximally-dimensioned grid using current limits will exceed the physical life of the GPU, and with a tiny bit of math a multi-dimensional grid can be effectively re-shaped into a virtual single-dimensional one with almost 2⁶³ blocks (as Curefab already pointed out).

Robert_Crovella · January 17, 2025, 4:54pm

That is a bit puzzling. Use of int64_t seems to compile fine.

CUDA does have hardware-imposed limits on grid and block dimensions. These are documented and can be immediately retrieved with the deviceQuery sample code.

Yes, and I would say the godbolt link I provided proves it.

There are no such dimensions that exceed the range of positive numbers available in int type in C++.
If you have a number larger than that range, it would be illegal to use it in grid or block dimensions in CUDA, currently.

Topic		Replies	Views
Max Dimension of GridSize and BlockSize CUDA Programming and Performance	8	10361	June 19, 2011
Grid-Block-Thread Configuration CUDA Programming and Performance	3	3270	January 23, 2014
Problems with maximum grid dimension CUDA Programming and Performance	2	730	October 16, 2018
DIM3 data type cannot lanuch at MPI CUDA Programming and Performance	4	1044	March 5, 2016
K10 has a problem with "large" gridDim.x CUDA Programming and Performance	4	1934	July 30, 2013
A question for block dimension on cuda C programming CUDA Programming and Performance	2	906	November 18, 2015
Wrong gridDim ... causes: invalid configuration argument CUDA Programming and Performance	2	6916	November 26, 2009
Dimensions of a Block and a Grid CUDA Programming and Performance	7	13159	May 1, 2008
problem with 3 dimensional thread block with a three dimensional grid, the kernel was not executed CUDA Programming and Performance	6	6064	February 10, 2011
Grid dimensions CUDA Programming and Performance	6	5793	September 18, 2009

Query on 64-bit Integer Support for dim3 Parameters in CUDA

My Questions:

Related topics