Small 1-line kernel shows strange behaviour while calculating pow(-1.0f, 1.0f).
The result is normal (-1.0) when one writes pow(-1.0, 1.0).
Driver version is 182.08
Using -use_fast_math is necessary to reproduce the situation on our system.
Here is sample program demonstrating this. Could you reproduce this situation or it is our system problem?
[codebox]/* Problem.cu
1-string kernel Problem() returns 1.#QNAN0 as a result of execution. Emulation mode works OK (returns -1.0).
However
pow(-1.0, 1.0)
works normally.
Compiler’s input
“C:\CUDA\bin\nvcc.exe” -arch sm_13 -ccbin “c:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin” -use_fast_math -Xcompiler "/EHsc /W3 /nologo /O2 /Zi /MT " -maxrregcount=32 --compile -o x64\Release\Problem.cu.obj Problem.cu
Copyright Anatoly Vershinin, Denis Sabitov.
*/
#include <stdio.h>
global void Problem(float* val)
{
*val = pow(-1.0f, 1.0f);
}
void main()
{
float *result;
cudaMalloc((void**)&result, sizeof(float));
Problem<<<1, 1>>>(result);
cudaThreadSynchronize();
float host_val;
cudaMemcpy(&host_val, result, sizeof(float), cudaMemcpyDeviceToHost);
printf("Cuda returned %f\n", host_val);
cudaFree(result);
}
[/codebox]
diddum
March 3, 2009, 7:58pm
2
I’m a newbie so I may be wrong,
but I read that using the switch -fastmath forces the use of the
fastmath routines, so instead of pow() you are really executing __powf(), which is
the CUDA fast version which could have some limitations (to make it run faster…).
According to Appendix B of the CUDA 2.1 Programming Guide, the functions
__powf(x,y) is implemented as __exp2f(y * __log2f(x)),
and if this is the case, the log2 of a negative number causes an error.
so, maybe, what are you experiencing is a “feature” of the fastmath version of pow(),
not a bug…
giovanni
I’m a newbie so I may be wrong,
but I read that using the switch -fastmath forces the use of the
fastmath routines, so instead of pow() you are really executing __powf(), which is
the CUDA fast version which could have some limitations (to make it run faster…).
According to Appendix B of the CUDA 2.1 Programming Guide, the functions
__powf(x,y) is implemented as __exp2f(y * __log2f(x)),
and if this is the case, the log2 of a negative number causes an error.
so, maybe, what are you experiencing is a “feature” of the fastmath version of pow(),
not a bug…
giovanni
diddum is correct in everything he/she wrote above. This isn’t a bug, is expected behavior. You’re sacrificing accuracy for speed if you choose to use -fastmath.
Thank you guys,
according PG this behaviour is really expected.
Two things however are daunted me:
the fact that program worked with previous version of driver and
the simple way to get this out is to clear f’s from the “-1.0f” and “1.0f” definition on arch 1.3.
The loss of accuracy is thought not to be absolutely wrong in the end, isn’t it?
The compiler goes better in time however: on arch 1.1 one got QNAN both with and without f’s. Good news! :)
According to Appendix B of the CUDA 2.1 Programming Guide, the functions
__powf(x,y) is implemented as __exp2f(y * __log2f(x)),
and if this is the case, the log2 of a negative number causes an error.
The King of the CUDA Math Library sent me an email this morning…
(1) -use_fast_math only affects certain single-precision math functions. In particular it causes powf(x,y) to be mapped to __powf(x,y), which is implemented as __exp2f(y * __log2f(x)), as noted in the documentation.
(2) In general, generic function names in the CUDA math library are overloaded, thus pow(float,float) is equivalent to powf(float,float) which is different from pow(double,double)
Therefore, when -use_fast_math is used, pow(float,float) maps to powf(float,float), which in turn maps to __powf(float,float). No such remapping takes place for pow(double,double). Since pow(double,double) and powf(float,float) adhere to C99 semantics, pow(-1.0,1.0) always delivers -1.0, while powf(-1.0f,1.0f) delivers -1.0f in the absence of -use_fast_math, but a NaN
when -use_fast_math is used.
CUDA 1.1 did not have support for double precision and all double precision operands and operations were mapped to single precision equivalents, which probably explains the difference in behavior you observed.