constant memory failed to load.

e.ping · July 2, 2007, 4:05pm

Constant memory seems to be the right place to put global parameters for the kernel threads.

So I placed everything in a struct, which I keep two copies, on CPU and GPU memories :

struct Context {

        int n;

        float *ptr;

        ... many other parameters...

};

static Context h_context;

__device__ __constant__ Context d_context;

Then, my typical kernel code is :

__global__ void myKernel()

{

    if( threadIdx.x < d_context.n )

        d_context.ptr[ threadIdx.x ] = 0.0f;

    ...

}

When I call that (stupidly) simple and very short kernel with :

   h_context.n = 200;

    cudaMalloc(&h_context.ptr, h_context.n * sizeof(float));

    cudaMemcpyToSymbol(d_context, &h_context, sizeof h_context);

    myKernel<<<256,256>>>();

The program blocks for 5 secondes, then windows timeout and I get a launch error…

Further investigation shows that :[list=1]

[*] My program is correct: If I just remove the constant in the above code,

everything works slowly, but fine.

[*] Slowly because nvcc use many many registers. Most probably to copy the whole struct in registers (let’s call it a poor man caching policy), and then I end up with a low number of threads by multiprocessors.

[*] If I do myself a copy in shared memory, (from the device copy), or in

local memory, it is still slow. nvcc still caches the whole shared/local memory with registers.

[*] If I use -maxrregcount to minimise register used, it is worse. cubin output shows things end up in local memory.

[*] :) But I found a kludge : putting a __syncthreads() as first instruction of my kernel fix the problem…

I do suspect a driver/compiler bug…

downey · July 5, 2007, 9:36pm

You look to be misunderstanding what constant does. constant only allows modifications to be done on the host side not the device side. I’m surprised that your kernel compiles as the kernel shouldn’t be allowed to write to your d_context array. Removal of the constant directive would fix the problem but what you really want to do is use the cudaMemcpyToSymbol() function from your host code to load the appropriate data into the d_context. Your kernel can then read the data but not write back to it. If you want your kernel to be able to write back then you can’t use the constant qualifier.

I would hope that the newer compiler will catch this error and fail the compile. I can’t say I’ve tried to do so though.

e.ping · August 25, 2007, 3:33am

The kernel is not writting to d_context !

d_context.ptr is a pointer, that is read to find some base address of a cudaMalloc()'ed block sitting in global memory. Where the write do occurs.

I still have this random bug repetitively with several kernels. And all the CUDA capable driver I found. My interpretation is that kernel execution might start too early, before the constant cache is ready… Compiler bug ? Driver bug ? Chip bug ? Or just a bad dream ???

PS. Running CUDA 1.0 on a 8800GTX / Windows / Driver 162.15.

asadafag · August 25, 2007, 5:56am

Well… my guess is: just do not use a struct. If you don’t want many memcpys, use one big int array (or char array), and manually cast everything.
I experienced nvcc’s break down when handling complicated structs. So I end up stop using it. Maybe it mistakenly thinks you’re writing the struct and copies it.

Topic		Replies	Views
Using Constant memory to hold Parameters CUDA Programming and Performance	2	2665	April 10, 2013
Am I using the constant memory properly? CUDA Programming and Performance	4	1770	May 15, 2013
Constant memory usage in CUDA code CUDA Programming and Performance	3	8960	February 5, 2012
Device constant memory from shared object CUDA Programming and Performance	1	6416	June 9, 2009
Can I use __constant__ memory with pointer to it as kernel arg CUDA Programming and Performance	4	4429	April 2, 2011
__constant__ Strangeness CUDA Programming and Performance	5	2485	July 21, 2008
__constant__ memory which is device-side only (avoiding cudaMemcpyToSymbol) CUDA Programming and Performance	9	9425	June 15, 2017
Constant Memory - When are variables cleaned? Multiple kernels for same constant... CUDA Programming and Performance	10	8862	January 22, 2008
constant memory usage problem unexpected behavior using constant memory CUDA Programming and Performance	15	9703	April 1, 2009
Constant memory when having more than one file external does not work CUDA Programming and Performance	24	3196	August 27, 2010

__constant__ memory failed to load.

Related topics

constant memory failed to load.