Default flush to zero mode on Fermi?

Lev · May 6, 2010, 11:47am

I compiled program with three options, 1.0, 1.2 and 2.0 arch. I checked ptx output and Fermi variant was full with instructions with ftz postfix. I did not set any new compiler 3.0 flags. While programming manual states that default mode is full precision. How is it possible?

Simon_Green · May 7, 2010, 1:41pm

Yes, we changed the default for compute 2.0 targets.

See the programming guide, p.142:

“For devices of compute capability 2.0 and higher, code must be compiled with -ftz=false, -prec-div=true, and -prec-sqrt=true to ensure IEEE compliance (this is the default setting; see the nvcc user manual for description of these compilation flags); code compiled with -ftz=true, -prec-div=false, and -prec-sqrt=false comes closest to the code generated for devices of compute capability 1.x.”

Lev · May 7, 2010, 1:46pm

Guide states that default is ftz=false. If I understand that sentence right.

avidday · May 7, 2010, 2:08pm

Simon, doesn’t this:

Directly contradict this (emphasis added)?

SPWorley · May 7, 2010, 11:22pm

Does Fermi handle denormal doubles at full speed? I seem to remember something about that (which would be a big feature over CPUs which drop to microcode emulation at a severe speed penalty.)

If denormals are indeed full speed, why not make ftz=false the default? Just to match the common rounding mode on CPUs?

Lev · May 9, 2010, 4:48pm

I think it is better to maintain compatibility with current programs. Without flush to zero mode they may work differently.

Simon_Green · May 10, 2010, 10:53am

Sorry, my bad, I meant “-ftz=false”, i.e. denormals are enabled by default. I modified the original post to avoid further confusion.

I’m not sure why you are seeing .ftz in the PTX.

Lev · May 10, 2010, 8:48pm

I put 3 targets, 1.0, 1.2 and 2.0 and 2.0 is full of ftz postfix. Maybe it is because of I put many tragets? Somebody else need to check.

Topic		Replies	Views
Default floating point compiler options Jetson TK1	1	665	February 8, 2018
-arch sm_13 vs -arch sm_20 (sm_20 slower on C2050) CUDA Programming and Performance	21	7493	December 21, 2010
why slower with flags "-arch; sm_20" CUDA Programming and Performance	8	1356	September 9, 2011
Are there plans to implement -ffinite-math-only -fno-signed-zeros? CUDA Programming and Performance	10	294	November 18, 2024
Double not supported; demoting to float Compiling on a comp: 2.1 device and getting precision errors CUDA Programming and Performance	5	2653	February 28, 2012
Precision and rounding in float point registers. CUDA Programming and Performance	12	8722	October 15, 2010
Force single precision On Fermi CUDA Programming and Performance	0	672	December 13, 2010
single precision arithmetic in FERMI architectures nvcc flags? CUDA Programming and Performance	4	3632	August 6, 2011
CUDA "emulating emulation mode" What changes to the fp control register? CUDA Programming and Performance	4	2162	September 10, 2008
CUDA programs not working on Fermi! CUDA Programming and Performance	4	8252	July 11, 2010

Default flush to zero mode on Fermi?

Related topics