Unsuspected work of pow() function pow() device function works incorrectly witn negative numbers

Small 1-line kernel shows strange behaviour while calculating pow(-1.0f, 1.0f).

The result is normal (-1.0) when one writes pow(-1.0, 1.0).

Driver version is 182.08

Using -use_fast_math is necessary to reproduce the situation on our system.

Here is sample program demonstrating this. Could you reproduce this situation or it is our system problem?

[codebox]/* Problem.cu

1-string kernel Problem() returns 1.#QNAN0 as a result of execution. Emulation mode works OK (returns -1.0).


pow(-1.0, 1.0)

works normally.

Compiler’s input

“C:\CUDA\bin\nvcc.exe” -arch sm_13 -ccbin “c:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin” -use_fast_math -Xcompiler "/EHsc /W3 /nologo /O2 /Zi /MT " -maxrregcount=32 --compile -o x64\Release\Problem.cu.obj Problem.cu

Copyright Anatoly Vershinin, Denis Sabitov.


#include <stdio.h>

global void Problem(float* val)


*val = pow(-1.0f, 1.0f);	


void main()


float *result;

cudaMalloc((void**)&result, sizeof(float));

Problem<<<1, 1>>>(result);


float host_val;

cudaMemcpy(&host_val, result, sizeof(float), cudaMemcpyDeviceToHost);

printf("Cuda returned %f\n", host_val);




I’m a newbie so I may be wrong,
but I read that using the switch -fastmath forces the use of the
fastmath routines, so instead of pow() you are really executing __powf(), which is
the CUDA fast version which could have some limitations (to make it run faster…).

According to Appendix B of the CUDA 2.1 Programming Guide, the functions
__powf(x,y) is implemented as __exp2f(y * __log2f(x)),
and if this is the case, the log2 of a negative number causes an error.

so, maybe, what are you experiencing is a “feature” of the fastmath version of pow(),
not a bug…


diddum is correct in everything he/she wrote above. This isn’t a bug, is expected behavior. You’re sacrificing accuracy for speed if you choose to use -fastmath.

Thank you guys,

according PG this behaviour is really expected.

Two things however are daunted me:

  1. the fact that program worked with previous version of driver and

  2. the simple way to get this out is to clear f’s from the “-1.0f” and “1.0f” definition on arch 1.3.

The loss of accuracy is thought not to be absolutely wrong in the end, isn’t it?

The compiler goes better in time however: on arch 1.1 one got QNAN both with and without f’s. Good news! :)

The King of the CUDA Math Library sent me an email this morning…