Change the number of registers per kernel with OpenACC?

Dear all,

I am trying to optimize the performance of a code written in Fortran + OpenACC. Changing the maxregcount at compile time modifies the performances of the code, and the optimal maxregcount can vary depending on the kernel.

I was wondering if it is possible to change the number of registers available per kernel for each given kernel. I saw in an other code (in CUDA Fortran) that they were using something like launch before a kernel call, but I couldn’t find a reference to launch neither in the CUDA Fortran documentation nor in the OpenACC documentation.
Would it be possible to use this to tune the number of registers per kernel in my OpenACC code? How does it work exactly?

Thank you for your answer!

Hi mbr.joos,

Unfortunately, no. Setting maxregcount can only be applied per file.

We do have an open RFE for adding at the ability for users to set the launch bounds (TPR#19302) and I have added your request to it.


Hi Mat,

Thank you for you answer. I will eagerly wait for this new functionality!

Thanks again,

Some simple support for launch bounds has been in our compilers since early 2020. Here is the syntax:

attributes(global) launch_bounds(256,8) subroutine test(a,b,c,n)