Can't compile fast math functions


I am trying to use the fast math version of pow, __pow, but when I compile nvcc complains that __pow is a host function. error: calling a host function from a __device__/__global__ function is only allowed in device emulation mode

Out of curiosity I tried to use __cos and get the same result. I am casting the arguments to float and return value to floats, I have tried compiling with --use_fast_math and without. But, it still doesn’t compile, and I have run out of ideas.

This is how nvcc is called

/usr/local/cuda/bin/nvcc --save-temps  -O3 -D_TEST  --ptxas-options=-v   -I/usr/local/cuda/include -Icommon/inc -L/usr/local/cuda/lib64  -L/home/sjf/cuda//lib64  -lcutil_x86_64D  -lcudart -o obj64/ -c

PS. I had a look at the PTX ISA, and I saw the cos, sin, etc instructions, but I couldn’t find the instruction that __pow should supposedly compile to.

Look at Appendix B7 in the CUDA 3.0 programming guide. The fastmath function name is __powf(). You are probably getting the error because __pow() is defined internally in host math library somewhere.

That works, thanks a lot.

It turns out there is no actual pow instruction, but it compiles to a log, and epx2.