unable to invoke device function from dynamic code compiled by nvrtc

Let me take saxpy demo to illustrate my question
here is the demo from cuda doc:

const char *saxpy = "                                           \n\
extern \"C\" __global__                                         \n\
void saxpy(float a, float *x, float *y, float *out, size_t n)   \n\
{                                                               \n\
  size_t tid = blockIdx.x * blockDim.x + threadIdx.x;           \n\
  if (tid < n) {                                                \n\
    out[tid] = a * x[tid] + y[tid];                             \n\
  }                                                             \n\
}

In my program, I have defined a device function

__device__ float f(float a, float x, float y)
{
  return a * x + y;
}

to replace

out[tid] = a * x[tid] + y[tid]

So, I modified the demo as follow:

const char *saxpy = "                                           \n\
extern \"C\" __global__                                         \n\
void saxpy(float a, float *x, float *y, float *out, size_t n)   \n\
{                                                               \n\
  size_t tid = blockIdx.x * blockDim.x + threadIdx.x;           \n\
  if (tid < n) {                                                \n\
    out[tid] = f(a, x[tid], y[tid]);                           \n\
  }                                                             \n\
}

However, I get error when I run the program:
“saxpy.cu(7): error: identifier “f” is undefined”

How to invoke my device function “f”? any idea is welcome.