cudaMemcpyToSymbol use details (translation/compilation unit?)

I am trying to move data structures from host to constant memory on a Tesla C1060 (compute 1.3). With the following function in mem.cu:

//mem.cu
#include "kernel.cuh"

int InitDCMem(SimuationStruct *sim)
{
  SimParamGPU h_simparam;

  h_simparam.na = sim->det.na;
  h_simparam.nz = sim->det.nz;
  h_simparam.nr = sim->det.nr;

  cudaMemcpyToSymbol(d_simparam, &h_simparam, sizeof(SimParamGPU));
}

The data structure (in a header file):

//kernel.cuh
typedef struct __align__(16)
{
  int na;
  int nz;
  int nr;
} SimParamGPU;

__constant__ SimParamGPU d_simparam;

The problem is that it seems the values are not being copied to the constant memory in the GPU.

Do I need to re-declare constant on \mem.cu. Which will only work if I enable “Generate Relocatable Device Code” option (-rdc=true).

Should I use ‘extern’ or ‘extern “C”’ in the kernel.cu source? (tried with errors).

There are no errors, the values are always set to 0 even though on the host side they are set properly.