PGI CUDA FORTRAN performance issue


When I use pgi 10.3 to compile my cuda fortran programs I got a slow down of about 2.5 comparatively to a compilation with the version 10.2
Did someone else have the same issue ?


Hi Matt,

Does your program perform many divides? In 10.2 we were getting reports of wrong answers when using divides. The default hardware divide is simply not precise enough for many customer. Hence, in 10.3 we started using a more precise divide. Unfortunately, this precise divide is much slower then the default and began causing slow-downs. In 10.4, we updated the ‘-Mcuda=fastmath’ flag to revert back to the less precise but faster divide.

If you do use many divides, then I would recommend upgrading to 10.4 (or 10.5 in a day or two) and use the “-Mcuda=fastmath” flag.

If you don’t use many divides, please send a report to PGI Customer Service ( including a reproducing example, since we will need to investigate the problem.


Hi Mat,

Thanks for your answer, it has been driving me nuts.
About one tenth of my floating point operations are divisions (according to the ptx). I did not remark accuracy problem using the version 10.2.
So I will keep going with the 10.2 until the admin install the newest versions.