I compiled program with three options, 1.0, 1.2 and 2.0 arch. I checked ptx output and Fermi variant was full with instructions with ftz postfix. I did not set any new compiler 3.0 flags. While programming manual states that default mode is full precision. How is it possible?
Yes, we changed the default for compute 2.0 targets.
See the programming guide, p.142:
“For devices of compute capability 2.0 and higher, code must be compiled with -ftz=false, -prec-div=true, and -prec-sqrt=true to ensure IEEE compliance (this is the default setting; see the nvcc user manual for description of these compilation flags); code compiled with -ftz=true, -prec-div=false, and -prec-sqrt=false comes closest to the code generated for devices of compute capability 1.x.”
Guide states that default is ftz=false. If I understand that sentence right.
Simon, doesn’t this:
Directly contradict this (emphasis added)?
Does Fermi handle denormal doubles at full speed? I seem to remember something about that (which would be a big feature over CPUs which drop to microcode emulation at a severe speed penalty.)
If denormals are indeed full speed, why not make ftz=false the default? Just to match the common rounding mode on CPUs?
I think it is better to maintain compatibility with current programs. Without flush to zero mode they may work differently.
Sorry, my bad, I meant “-ftz=false”, i.e. denormals are enabled by default. I modified the original post to avoid further confusion.
I’m not sure why you are seeing .ftz in the PTX.
I put 3 targets, 1.0, 1.2 and 2.0 and 2.0 is full of ftz postfix. Maybe it is because of I put many tragets? Somebody else need to check.