I am new to cuda programming.
I have a global kernel function:
extern __global__ void cube(float * d_out, float * d_in){
int id = blockIdx.x + threadIdx.x;
float f = d_in[id];
d_out[id] = f*f*f;
}
I am trying to create a std::function object of this global square function.
nvstd::function<void(float*, float*)> kernel = square;
When I try to run this kernel on GPU by calling
kernel<<<1, NUM_THREADS>>>(d_out, d_in);
it gives an error : a host function call cannot be configured
If I instead use nvstd::function it gives the error : a device function call cannot be configured
What is the difference between the nvstd implementation and std functional?
Also, how can I store a global function as a function object? Why does it work if I store it in a function pointer instead?