Unsuspected work of pow() function pow() device function works incorrectly witn negative numbers

PapaZol · March 3, 2009, 4:42pm

Small 1-line kernel shows strange behaviour while calculating pow(-1.0f, 1.0f).

The result is normal (-1.0) when one writes pow(-1.0, 1.0).

Driver version is 182.08

Using -use_fast_math is necessary to reproduce the situation on our system.

Here is sample program demonstrating this. Could you reproduce this situation or it is our system problem?

[codebox]/* Problem.cu

1-string kernel Problem() returns 1.#QNAN0 as a result of execution. Emulation mode works OK (returns -1.0).

However

pow(-1.0, 1.0)

works normally.

Compiler’s input

“C:\CUDA\bin\nvcc.exe” -arch sm_13 -ccbin “c:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin” -use_fast_math -Xcompiler "/EHsc /W3 /nologo /O2 /Zi /MT " -maxrregcount=32 --compile -o x64\Release\Problem.cu.obj Problem.cu

Copyright Anatoly Vershinin, Denis Sabitov.

*/

#include <stdio.h>

global void Problem(float* val)

{

*val = pow(-1.0f, 1.0f);

}

void main()

{

float *result;

cudaMalloc((void**)&result, sizeof(float));



Problem<<<1, 1>>>(result);

cudaThreadSynchronize();

float host_val;

cudaMemcpy(&host_val, result, sizeof(float), cudaMemcpyDeviceToHost);

printf("Cuda returned %f\n", host_val);

cudaFree(result);

}

[/codebox]

diddum · March 3, 2009, 7:58pm

I’m a newbie so I may be wrong,
but I read that using the switch -fastmath forces the use of the
fastmath routines, so instead of pow() you are really executing __powf(), which is
the CUDA fast version which could have some limitations (to make it run faster…).

According to Appendix B of the CUDA 2.1 Programming Guide, the functions
__powf(x,y) is implemented as __exp2f(y * __log2f(x)),
and if this is the case, the log2 of a negative number causes an error.

so, maybe, what are you experiencing is a “feature” of the fastmath version of pow(),
not a bug…

giovanni

netllama · March 3, 2009, 8:00pm

diddum is correct in everything he/she wrote above. This isn’t a bug, is expected behavior. You’re sacrificing accuracy for speed if you choose to use -fastmath.

PapaZol · March 4, 2009, 1:29pm

Thank you guys,

according PG this behaviour is really expected.

Two things however are daunted me:

the fact that program worked with previous version of driver and
the simple way to get this out is to clear f’s from the “-1.0f” and “1.0f” definition on arch 1.3.

The loss of accuracy is thought not to be absolutely wrong in the end, isn’t it?

The compiler goes better in time however: on arch 1.1 one got QNAN both with and without f’s. Good news! :)

tmurray · March 4, 2009, 5:46pm

The King of the CUDA Math Library sent me an email this morning…

(1) -use_fast_math only affects certain single-precision math functions. In particular it causes powf(x,y) to be mapped to __powf(x,y), which is implemented as __exp2f(y * __log2f(x)), as noted in the documentation.

(2) In general, generic function names in the CUDA math library are overloaded, thus pow(float,float) is equivalent to powf(float,float) which is different from pow(double,double)

Therefore, when -use_fast_math is used, pow(float,float) maps to powf(float,float), which in turn maps to __powf(float,float). No such remapping takes place for pow(double,double). Since pow(double,double) and powf(float,float) adhere to C99 semantics, pow(-1.0,1.0) always delivers -1.0, while powf(-1.0f,1.0f) delivers -1.0f in the absence of -use_fast_math, but a NaN

when -use_fast_math is used.

CUDA 1.1 did not have support for double precision and all double precision operands and operations were mapped to single precision equivalents, which probably explains the difference in behavior you observed.

Topic		Replies	Views
why the result of powf(-0.07346,2) equal to nan? OptiX	6	1388	January 13, 2020
__powf(x,y) gives nan CUDA Programming and Performance	11	3251	January 31, 2013
pow math function CUDA Programming and Performance	5	3477	June 26, 2007
__powf(): wrong behavior CUDA Programming and Performance	8	4902	October 23, 2009
pow function error with 1 float, 1 int input CUDA Programming and Performance	1	3000	June 26, 2007
A more accurate and faster implementation of powf() CUDA Programming and Performance	6	4687	February 13, 2025
Possible Compiler Bug with __powf CUDA Programming and Performance	0	1291	January 3, 2009
Bug in the POW function? CUDA Programming and Performance	8	2865	December 5, 2021
wrong results in device CUDA Programming and Performance	8	1783	November 19, 2014
Implement faster cuda intrinsics for specific power functions CUDA Programming and Performance cuda	5	4697	November 4, 2020

Unsuspected work of pow() function pow() device function works incorrectly witn negative numbers

Related topics