I am trying to use the fast math version of pow, __pow, but when I compile nvcc complains that __pow is a host function.
colorspace.cu(74): error: calling a host function from a __device__/__global__ function is only allowed in device emulation mode
Out of curiosity I tried to use __cos and get the same result. I am casting the arguments to float and return value to floats, I have tried compiling with --use_fast_math and without. But, it still doesn’t compile, and I have run out of ideas.
This is how nvcc is called
/usr/local/cuda/bin/nvcc --save-temps -O3 -D_TEST --ptxas-options=-v -I/usr/local/cuda/include -Icommon/inc -L/usr/local/cuda/lib64 -L/home/sjf/cuda//lib64 -lcutil_x86_64D -lcudart -o obj64/ahd.cu.o -c ahd.cu
PS. I had a look at the PTX ISA, and I saw the cos, sin, etc instructions, but I couldn’t find the instruction that __pow should supposedly compile to.