-Mcuda=3.1 enables fastmath?

In a previous topic, I noted with surprise that my PGI 10.8 install seemed to be using CUDA 2.3 by default even though I have 3.1 available:

> pgaccelinfo 
CUDA Driver Version:           3010

Device Number:                 0
Device Name:                   Tesla T10 Processor
Device Revision Number:        1.3
<snip>

and am using the latest driver:

> cat /proc/driver/nvidia/version 
NVRM version: NVIDIA UNIX x86_64 Kernel Module  256.44  Thu Jul 29 01:22:44 PDT 2010
GCC version:  gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)

So, I decided to do some investigating and found that when I use -Mcuda=3.1,… I seem to get fastmath no matter what. For example, if I compile using -Mcuda=ptxinfo,keepgpu,keepbin,keepptx,maxregcount:64,nofma -Kieee with and without fastmath, I get timings like:

> grep Kernel Without31-*/cudafor-flxy-SPvDPorig.out
Without31-fastmath/cudafor-flxy-SPvDPorig.out:   Kernel :     67.512 +/-      1.289
Without31-Nofastmath/cudafor-flxy-SPvDPorig.out:   Kernel :    177.938 +/-      2.823

where the fastmath version is faster. But, when I use the 3.1 (-Mcuda=3.1,ptxinfo,keepgpu,keepbin,keepptx,maxregcount:64,nofma -Kieee):

> grep Kernel With31-*/cudafor-flxy-SPvDPorig.out
With31-fastmath/cudafor-flxy-SPvDPorig.out:   Kernel :     67.215 +/-      1.344
With31-Nofastmath/cudafor-flxy-SPvDPorig.out:   Kernel :     72.521 +/-      1.173

Now, I know timings aren’t proof, but when I look at the differences from CPU code looking at the number of elements in an array that fail a criterion (difference from CPU value), I get:

Nofastmath: Num fail:            89  out of:          1782
fastmath: Num fail:           743  out of:          1782

With 3.1 in the -Mcuda list:

Nofastmath: Num fail:           743  out of:          1782
fastmath: Num fail:           743  out of:          1782

This seems to suggest to me that using -Mcuda=3.1 is enabling fastmath by default since I’m getting the same differences in the same place (not shown, but confirmed). Is this true? And if so, is there a “nofastmath” option for use with 3.1?

Thanks,
Matt

Hi Matt,

I asked Michael about this. There’s nothing we’ve done but it’s possible that the CUDA 3.1 header files have changed. I’ve added TPR#17203 and asked Michael to investigate.

Thanks,
Mat