Device constant memory from shared object


I run into a slightly exotic problem when trying to use a device constant variable declared in my main program from a kernel compiled and loaded separately as a shared object file. The kernel loaded and launched via the .so file generally works well, but it doesn’t see the changes made to the constant memory made by cudaMemcpyToSymbol in the main program.

More details:

— --------------------

  1. device constant float global_constants[4];

  2. float constants = {8.0};

    cudaMemcpyToSymbol(global_constants, constants, 1 * sizeof(float), 0);" );

  3. load containing CUDA kernel + extern “C” launch function using dlopen()

---- ----------------

The code that goes into


device constant float global_constants[4];

global void test_kernel(float* g_idata, float* g_odata)


const unsigned int tid = threadIdx.x + blockDim.x * blockIdx.x;

g_odata[tid] =  d_iidata[tid] + global_constants[0];


extern “C” int launch_kernel(float *d_idata, float *d_odata, unsigned num_elements)


unsigned num_threads = 128;

dim3  grid( (num_elements + num_threads - 1) / num_threads, 1, 1);

dim3  threads( num_threads, 1, 1);

test_kernel<<< grid, threads >>>( d_idata, d_odata);

return 1;



Like I said, the kernel generally works fine, except for the problem with accessing global constant memory. Even though the kernel should read 8.0 from global_constants[0], it reads 0.0. I’ve tried putting an “extern” keyword in front of the global_constants declaration in, but that doesn’t help.

Doing an “nm | grep global” shows:

0000000000000010 b __shadow_global_constants

The same symbol shows up in main.o.

Usually, for normal variables in shared objects, I think the global_constants variable in (at least when defined as extern) would be left undefined, but would bind to the global_constant variable defined in the main program during dynamic linking by dlopen()… At least that’s how my pure c test program behaves.

So, somehow nvcc seems to forget to declare (_shadow)global_constants as extern when generating

If anyone have some idea how to solve this already now, I’d be grateful.

PS, if I do a

cudaMemcpyToSymbol(global_constants, constants, 1 * sizeof(float), 0);" );

directly in launch_kernel() in, everything works as expected, but I’d like to avoid that.

I’m using CUDA 2.1 on Ubuntu 8.10.


I just tried CUDA 2.2 and I’m experiencing the same problem.

This seems to me like a bug/limitation in nvcc.

Would anyone at NVIDIA be able to suggest a workaround, or provide a short explanation of how constant memory is stored, initialized, accessed and downloaded to the GPU.

If the problem is not clear, I’m happy to provide more details and a better explanation.