I’m trying to use
CUDA_SAFE_CALL( cudaMemcpyToSymbol(d_A, h_A, mem_size_A) );
to load data from host to constant memory. But the timing I get is much much larger than when I use
CUDA_SAFE_CALL(cudaMemcpy(d_A, h_A, mem_size_A,cudaMemcpyHostToDevice));
to load the same data from host to device.
Am I doing anything wrong here or is it supposed to be this way?