How is Device Determined for Constant Memory

I declared constant memory using the following statement:

 __constant__ Int16 d_FILTER[FILTER_MAXIMUM];

How does the NVCC know, at compile time, on which device this shared memory resides, given that the function cudaSetDevice is called at runtime?

The constant memory declaration like that is normally a global declaration. That means that:

  1. It is instantiated on every device visible to the CUDA runtime
  2. It must be individually populated/initialized, on each device. Static initialization works as you expect - all device copies receive the same static initialization. For dynamic initialization, the implication here is that you need a cudaSetDevice()...cudaMemcpyToSymbol() sequence for each device in your system, that you intend to access that constant memory from.