I am trying to optimize the performance of a code written in Fortran + OpenACC. Changing the maxregcount at compile time modifies the performances of the code, and the optimal maxregcount can vary depending on the kernel.
I was wondering if it is possible to change the number of registers available per kernel for each given kernel. I saw in an other code (in CUDA Fortran) that they were using something like launch before a kernel call, but I couldn’t find a reference to launch neither in the CUDA Fortran documentation nor in the OpenACC documentation.
Would it be possible to use this to tune the number of registers per kernel in my OpenACC code? How does it work exactly?
Thank you for your answer!