constant memory is read-only from device code perspective. You cannot move data from global memory to constant memory using CUDA device code. Only via host code. The only method to populate constant memory directly is via cudaMemcpyToSymbol.
There are numerous examples of constant memory usage on various forums.
The CUDA samples codes: volumeRender, volumeFilter, convolutionSeparable, dxtc, bilateralFilter, convolutionTexture, binomialOptions, quasirandomGenerator, nbody, particles, smokeParticles all demonstrate usage of constant memory.
I suggest asking questions about the profiler on the profiler forums.