why the result of powf(-0.07346,2) equal to nan?

I want to use optix acclerate ray tracing, when I use the math libraries, I find the result of powf(-0.07346,2) equal to nan. but the powf(2,2) equal to 4.0000.

Looking forward to your reply, sincerely

I don’t find that to be true:

$ cat t490.cu
#include <math.h>
#include <stdio.h>

__global__ void k(float x, float y){

  printf("%f\n", powf(x, y));

int main(){

$ nvcc -o t490 t490.cu
$ cuda-memcheck ./t490
========= ERROR SUMMARY: 0 errors

Hi 346221593,

Are you using powf() inside an OptiX program, or a CUDA kernel? If after seeing Robert_Crovella’s response you can still reproduce your issue, please let us know some details: what is your system type, driver version, optix version. And if you have a very small code snippet that can reproduce the issue, that will be exceptionally helpful.


I use powf() in optix 6.0.0 ,ubuntu 16.04, my cuda version is 10.1
and my test code:

  1. RT_PROGRAM void pinhole_camera() { if(launch_index.x == 261 &&launch_index.y == 0) { float x = -0.45; int y = 2; rtPrintf("powf(%f,%d) = %f \n\n",-x,y,powf(-x,y)); rtPrintf("powf(%f,%d) = %f \n\n",x,y,powf(x,y)); } }

but its output is

  1. powf(0.450000,2) = 0.202500

    powf(-0.450000,2) = nan

my card info:


Tue Jan 7 10:47:15 2020
| NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 |
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| 0 GeForce GTX 970 Off | 00000000:01:00.0 On | N/A |
| 47% 26C P2 70W / 250W | 824MiB / 4039MiB | 0% Default |

| Processes: GPU Memory |
| GPU PID Type Process name Usage |
| 0 1109 G /usr/lib/xorg/Xorg 305MiB |
| 0 1935 G compiz 167MiB |
| 0 2244 G /usr/lib/firefox/firefox 3MiB |
| 0 3833 C ./burgercpp 334MiB |

I used pow(double,double) to instead ,it worked . thanks.

Hi, it’s unfortunate to use double precision if you don’t need it. I assume your exponent is not always 2, correct?

I was able to reproduce this behavior, and then I noticed that it is caused by the “fast math” option, and documented in the CUDA programming guide here: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#intrinsic-functions Table 9. You can reproduce this behavior in the nvcc example above if you add the nvcc command line option “–use_fast_math”.

You have a few alternative options, if you want. Fast math is something you can turn on per compilation unit, so one option is to leave it turned on, but compile a 2nd unit that has fast math disabled, and put a pow() wrapper function in there. A second option is to turn off fast math, and use the fast float device intrinsics everywhere you need them. That way you can freely mix the fast versions with the robust versions. This might be a little painful to manage, and runs the risk of slowing things down, or later accidentally introducing slow math.