Improving performance when calling subroutines inside of openmp teams regions with nvfortran

utheriwagura · January 29, 2026, 6:13pm

Thanks for getting back to me Mat!

I was able to get the faster runtimes in my MRE with the -Minline flag, which is nice. However, I have not been able to get the subroutine call in my main code base to inline, even when I pass in the -Minline flag, turn the optimization up to -04, and simplify the logic of the subroutine. The strange part is that -Minfo=inline does not give me any reason as to why it is not inlining, as it does for some other subroutines in my code base. I haven’t been able to replicate this behavior with my MRE yet, but do you have any ideas as to what may be causing this?

Regarding register usage: this does seem to be an issue with the subroutine in our main code base, at least based on some of the output from nsight compute. The report suggests that in the lead up to the subroutine we use significantly more registers, and the average number of threads we run per warp goes down to 1. Is it fair to assume the decrease in threads is due to register pressure, or could there be other causes of this?

Topic		Replies	Views
OpenACC: Inlining of device subroutine leads to nvlink error Legacy PGI Compilers	5	13117	December 17, 2015
NV 23.11 not in-lining with -Minline (works with 23.5) nvc, nvc++ and nvfortran	10	605	February 1, 2024
Acc loop with a FORTRAN subroutine call (acc routine worker) is not parallelized! Legacy PGI Compilers	18	2141	April 6, 2022
Nvfortran error nvc, nvc++ and nvfortran	39	3854	January 17, 2024
function inline problem Legacy PGI Compilers	2	3562	April 15, 2014
Dealing with allocatable arrays with OpenACC Legacy PGI Compilers	8	2105	November 30, 2020
The Fortran OpenACC acceleration code compiles successfully but still runs on the CPU nvc, nvc++ and nvfortran	14	268	December 28, 2024
NVHPC 26.1 fort2 TERMINATED by signal 11 nvc, nvc++ and nvfortran nvbugs	7	87	March 12, 2026
accelerator parallization issues Legacy PGI Compilers	18	26885	April 12, 2010
NVFORTRAN-F-0000-Internal compiler error. child tinfo should have been created at outlining function for host nvc, nvc++ and nvfortran	4	105	September 12, 2024

Improving performance when calling subroutines inside of openmp teams regions with nvfortran

Related topics