OpenACC SPEEDUP of tesla:cuda8.0 versus tesla:cc50

Hi,

I don’t have a problem but wanted to share an interesting result.

I am running an OpenACC Fortran code on a GeFroce GTX 970 (cc is 5.0)

When I compile with -ta=tesla:cc50 the code runs in 27.2 seconds consistently.

When I compile with -ta=tesla:cuda8.0 the code runs in only 22.3 seconds (an 18% improvement!) consistently.

Since my card is cc 5.0 why is the cuda8.0 flag so much faster (not that I am complaining!)?

Does the compiler not use cuda8.0 by default?

Is it safe to use ta=tesla:cuda8.0 for all GPU cards even if their cc is old?

  • Ron

Hi Ron,

Does the compiler not use cuda8.0 by default?

With PGI 16.10, the default is CUDA 7.0. You can see which version is the default by running the “-help -ta” command using any of the PGI drivers. For example: “pgfortran -help -ta”.

Since my card is cc 5.0 why is the cuda8.0 flag so much faster (not that I am complaining!)?

My best guess is better register allocation which is leading to better utilization. However, you’ll want to compare the two profiles to determine the exact reason.

  • Mat