cudaMemcpyToSymbol

I’m trying to use

CUDA_SAFE_CALL( cudaMemcpyToSymbol(d_A, h_A, mem_size_A) );

to load data from host to constant memory. But the timing I get is much much larger than when I use

CUDA_SAFE_CALL(cudaMemcpy(d_A, h_A, mem_size_A,cudaMemcpyHostToDevice));

to load the same data from host to device.

Am I doing anything wrong here or is it supposed to be this way?

Thanks.

My main question is: Is writing to constant memory slower than writing to global memory?