limit number of registers

Hi Mat,

In my code I have severel kernels. According to NVVP performance would be better if I limit number of registers for one of the kernel.
I can set this limitation for the entire file. Is it possible to set this limit for the particular kernel without splitting the code?

Thanks,
Alexey

Hi Alexy,

Is it possible to set this limit for the particular kernel without splitting the code?

No since the maxregcount option is only applicable at a file level.

Though, I’m in the process of developing a heuristic that the compiler can use to set the minimum block size parameter in the “launch_bounds” on a per kernel basis (which in turn adjusts the register usage). It’s a bit difficult since the compiler wont have run time profile information, but the hope is that it can make an educated guess.

  • Mat

I see. Thank you Mat.

In this case I have a sugestion: compile each kernel independently (now you generate several kernels per file/module), add incode compiler directive like “!$pgi -???:” and apply it to the followed kernel(s)

setting kernel boundaries option is also good idea

Alexey

Hi Alexy,

It may be something we need to add and new directive but let me see if I can figure out a way to have the compiler solve the issue. The move to using CUDA 6.0 and it’s desire to use more registers has negatively impacted several of the SPEC ACCEL OpenACC benchmark codes. I can get it back by using maxregcount, but each benchmark needs a different value. Given I can’t change the source (it’s against the SPEC run rules), I really need to have the compiler adjust as necessary.

Plus, I’m seeing it in customer codes as well and I really don’t like having to have end users tune their code for particular targets. One of the main benefits of using OpenACC is so that the compiler takes care of the tuning for you.

  • Mat

One of the main benefits of using OpenACC is so that the compiler takes care of the tuning for you.

That’s true.

In this case… compile kernel → get number of regs/smem/… → estimate occapancy and limiters → recompile kernel/adjast launch paramaters

Alexey