It seems common to use cudaMemcpyToSymbol and cudaMemcpyFromSymbol to link global device variables to a device memory block. However, I never saw any clear description about these functions. To make my questions clear, a little program is given below.
#include <stdio.h>
__device__ int *a;
__global__ void kernel()
{
*a=3;
}
int main()
{
int *da;
cudaMalloc(&da,1*sizeof(int));
cudaMemcpyToSymbol(a,&da,sizeof(int*));
kernel<<<1,1>>>();
cudaDeviceSynchronize();
int ha[1];
cudaMemcpy(ha,da,sizeof(int),cudaMemcpyDeviceToHost);
printf("%d\n",ha[0]); // this output 3
cudaFree(da);
exit(0);
}
My questions are:
-
da itself is a pointer, so why &da?
-
cudaMemcpyToSymbol’s default copy direction is cudaMemcpyHostToDevice. So da is treated as a host memory block? Obviously it’s not in host memory. So why NVIDIA do such way?
-
cudaMemcpyFromSymbol cannot replace cudaMemcpyToSymbol in this example no matter using what argument combination, am I right? why I can’t do this?
-
From reference manual, symbol here may mean variable name in some cases. Is it the only meaning of symbol here?