I apologize for the simple nature of this question. Appreciate any help.
While going thru some cuda examples, I came across some code which I thought was not possible, but for some reason this code seem to run. I’ve looked at possible explanation, but couldn’t find one.
My (lack of) understanding was, if I have a variable in host memory, for me to use it inside a cuda kernel I had to create a copy of that variable in device memory, using cudaMalloc and cudaMemcpy.
However, the following code works properly (i.e. the addition works correctly). I’m curious how?
global void add(int a, int b, int* c)
*c = a + b;
int a=4, b=5, c=0; // values on the host
int *d_c; // device copies of a, b, c
add<<<1,1>>>(a, b, d_c);
My question is, how does a and b gets from host to device?