constant memory allocation when, where, and how?

I understand how to implement and use constant memory. I’ve always declared the variables and used cudaMemcpyToSymbol to initialize the values during runtime - maybe there is also another way.

Right now, though, I’m trying to understand when and how it is allocated. I looked at this thread, which was sort of confusing. My primary question is, what happens a) at compile time, b) at run time, and c) at access time (in a kernel) when I use this code:

float host_data;

__constant__ float c_data;

cudaMemcpyToSymbol("c_data", host_data, SIZE * sizeof(float), 0, cudaMemcpyHostToDevice);

Naively, I imagine that when you declare constant variables, memory is “allocated” on the device during runtime. Thus, you couldn’t declare constant data to exceed the total constant memory available on the device. And thus, it is already available when you copy to symbol from the host or access it from a kernel.

Some leading sub-questions:

  1. Is constant memory “allocated”? ie, does it make sense to use that term?

  2. If my naive understanding is correct, when during runtime? My best guess is at cudaSetDevice

  3. What happens at compile time? In the linked thread, Paulius comments, “constants are embedded in the executable”. What does this mean?

4. Lastly, someone suggested to me that constant memory was only allocated when a kernel used the memory. That seems implausible to me… can anyone confirm or deny? (resolved, this was just a misunderstanding)

Jeff

p.s. of course, a link to a reference is welcome, as I was unable to find any myself.