'setmaxnreg' ignored; unable to determine register count at entry

Hi, I’m trying to change the allocation of registers between my producer and my consumer threads, and I’m running into an issue with setmaxnreg removing all of the bits one by one I got to the compiler ignoring it even on an empty kernel:

root@szymon-dev-388orw-0:~/# nvcc main.cu -arch=sm_90a
ptxas info    : (C7508) Potential Performance Loss: ‘setmaxnreg’ ignored; unable to determine register count at entry.
root@szymon-dev-388orw-0:~/# cat main.cu
global launch_bounds(32) void test()
{
asm volatile(“setmaxnreg.inc.sync.aligned.u32 %0;\n” : : “n”(32));
}

int main()
{
}

nvidia-smi:

±----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.105.08             Driver Version: 580.105.08     CUDA Version: 13.0     |
±----------------------------------------±-----------------------±---------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA H200                    On  |   00000000:53:00.0 Off |                    0 |
| N/A   28C    P0            102W /  700W |       0MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
±----------------------------------------±-----------------------±---------------------+
|   1  NVIDIA H200                    On  |   00000000:64:00.0 Off |                    0 |
| N/A   28C    P0             73W /  700W |       0MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
±----------------------------------------±-----------------------±---------------------+
|   2  NVIDIA H200                    On  |   00000000:75:00.0 Off |                    0 |
| N/A   26C    P0             75W /  700W |       0MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
±----------------------------------------±-----------------------±---------------------+
|   3  NVIDIA H200                    On  |   00000000:86:00.0 Off |                    0 |
| N/A   29C    P0             73W /  700W |       0MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
±----------------------------------------±-----------------------±---------------------+
|   4  NVIDIA H200                    On  |   00000000:97:00.0 Off |                    0 |
| N/A   27C    P0             75W /  700W |       0MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
±----------------------------------------±-----------------------±---------------------+
|   5  NVIDIA H200                    On  |   00000000:A8:00.0 Off |                    0 |
| N/A   25C    P0             73W /  700W |       0MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
±----------------------------------------±-----------------------±---------------------+
|   6  NVIDIA H200                    On  |   00000000:B9:00.0 Off |                    0 |
| N/A   27C    P0             73W /  700W |       0MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
±----------------------------------------±-----------------------±---------------------+
|   7  NVIDIA H200                    On  |   00000000:CA:00.0 Off |                    0 |
| N/A   25C    P0             75W /  700W |       0MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
±----------------------------------------±-----------------------±---------------------+

±----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |

±----------------------------------------------------------------------------------------+

Your usage of __launch_bounds__ doesn’t appear to be giving the compiler enough information.

Try a complete specification such as __launch_bounds__(512,4). Alternatively, use -maxrregcount switch.

This may also be of interest.

The code you have posted also has various syntax errors; I’m ignoring those for this discussion.

Thanks, adding additional parameters for _launch_bounds_ fixed the issue!