Understanding cudaMemcpyToSymbol and cudaMemcpyFromSymbol

It seems common to use cudaMemcpyToSymbol and cudaMemcpyFromSymbol to link global device variables to a device memory block. However, I never saw any clear description about these functions. To make my questions clear, a little program is given below.

#include <stdio.h>

__device__ int *a;

__global__ void kernel()

{

  *a=3;

}

int main()

{

  int *da;

  cudaMalloc(&da,1*sizeof(int));

  cudaMemcpyToSymbol(a,&da,sizeof(int*));

kernel<<<1,1>>>();

  cudaDeviceSynchronize();

int ha[1];

  cudaMemcpy(ha,da,sizeof(int),cudaMemcpyDeviceToHost);

  printf("%d\n",ha[0]); // this output 3

cudaFree(da);

  exit(0);

}

My questions are:

  1. da itself is a pointer, so why &da?

  2. cudaMemcpyToSymbol’s default copy direction is cudaMemcpyHostToDevice. So da is treated as a host memory block? Obviously it’s not in host memory. So why NVIDIA do such way?

  3. cudaMemcpyFromSymbol cannot replace cudaMemcpyToSymbol in this example no matter using what argument combination, am I right? why I can’t do this?

  4. From reference manual, symbol here may mean variable name in some cases. Is it the only meaning of symbol here?