Let me take saxpy demo to illustrate my question
here is the demo from cuda doc:
const char *saxpy = " \n\
extern \"C\" __global__ \n\
void saxpy(float a, float *x, float *y, float *out, size_t n) \n\
{ \n\
size_t tid = blockIdx.x * blockDim.x + threadIdx.x; \n\
if (tid < n) { \n\
out[tid] = a * x[tid] + y[tid]; \n\
} \n\
}
In my program, I have defined a device function
__device__ float f(float a, float x, float y)
{
return a * x + y;
}
to replace
out[tid] = a * x[tid] + y[tid]
So, I modified the demo as follow:
const char *saxpy = " \n\
extern \"C\" __global__ \n\
void saxpy(float a, float *x, float *y, float *out, size_t n) \n\
{ \n\
size_t tid = blockIdx.x * blockDim.x + threadIdx.x; \n\
if (tid < n) { \n\
out[tid] = f(a, x[tid], y[tid]); \n\
} \n\
}
However, I get error when I run the program:
“saxpy.cu(7): error: identifier “f” is undefined”
How to invoke my device function “f”? any idea is welcome.