Occupancy with pgc++13.2 ?

Hello,

I have serveral questions related to pgc++13.2. Suggestions are greatly welcome!

  1. I found that pgc++13.2 changed the resource configuration (e.g. number of block, number of thread) of OpenACC pragmas augmented code in runtime.
    Feedback from the compiler shows me that the compiler agreed with me on resource configuration grid(200,100) block(32,16,1):
26, #pragma acc loop gang(100), vector(16) /* blockIdx.y threadIdx.y */
28, #pragma acc loop gang(200), vector(32) /* blockIdx.x threadIdx.x */

But, when I profiled the program with nvprof, the resource configuration was changed to grid(256,512) block(32,16,1). Does pgc++13.2 noftify users on this change?

9e+09s       0ns         (256 512 1)      (32 16 1)        39        0B        0B         -           -         0         1         2  _Z8t1_f_acciiiPPdS0_S0_S0_S0_S0__30_      gpu
  1. Base on which criteria, the compiler decides the appropriate resource configuration?

  2. Which flags produce the following feedback from the compiler? Please tell me how the occupancy is calculated during the compile time.

CC 2.0 : 26 registers; 8 shared, 92 constant, 0 local memory bytes; 33% occupancy

Thank you very much,
Best regards,
Minh

Hi Mihn,

  1. I found that pgc++13.2 changed the resource configuration (e.g. number of block, number of thread) of OpenACC pragmas augmented code in runtime.

This is a known issue and currently being tracked as TPR#19149. We’re currently expecting to have this fixed in the 13.5 release.

  1. Which flags produce the following feedback from the compiler? Please tell me how the occupancy is calculated during the compile time.

We stopped printing the occupancy for various implementation reasons. Though, we could probably bring it back. Was it something you found useful?

You can still get the ptxinfo via the flag “-Mcuda=ptxinfo”. With the register and shared memory info, you can then calculate the occupancy via the CUDA Occupancy Calculator

  • Mat

Hi Mat,

I think occupancy information is still useful in case the user knows that the archieved memory bandwidth less than peak bandwidth and he wants to increase the occupancy.

I use pgc++ 13.2, and haven’t found the flag -Mcuda in pgc++ manual page yet. Is this the new flag on the newest version of pgc++?

Best regards,
Minh

I use pgc++ 13.2, and haven’t found the flag -Mcuda in pgc++ manual page yet. Is this the new flag on the newest version of pgc++

Sorry, I missed that you’re using C++. -Mcuda is for CUDA Fortran and CUDA x86, not C++.

I think occupancy information is still useful in case the user knows that the archieved memory bandwidth less than peak bandwidth and he wants to increase the occupancy.

Ok, I’ll pass that along. It will be a lower priority item, but should be put back in a some point.

  • Mat

Hi Mat,
I was also looking for the ptx info Output (using pgcc 13.6 and OpenACC) and found this thread. I would be nice to have the information (enabled by switch) about the number of used registers and shared memory. I can see the used shared memory with ACC_NOTIFY, but not the number of registers.

Would be nice to have it back. Thanks.

A follow up:

I have just tried to minimize the number of used registers by -ta=maxregcount:10. However, according to the NVIDIA visual profiler did nothing Change. Is the flag still working?

Sandra

I have just tried to minimize the number of used registers by -ta=maxregcount:10. However, according to the NVIDIA visual profiler did nothing Change. Is the flag still working?

I’m assuming you mean “-ta=nvidia,maxregcount:10”. All the “maxregcount” sub-option does is set the ptxas “-maxrregcount” option and that does appear to be working. Why ptxas wouldn’t honor this value, I’m not sure.

  • Mat