device to constant memory Trying to copy device mem to constant me

Hey all -

I’m having trouble copying data from device memory to constant memory.

I’m trying something like this:

__constant__ uint2 hlKdtree[1024];

   void *kdAddr;

    if(cudaGetSymbolAddress(&kdAddr, hlKdtree)) {

        printf("error: failed to get the address of hlKdtree\n");

        abort();

    }

    if(cudaMemcpy(hlKdtree, highLevelKdt, g_hlKdtreeLength, cudaMemcpyDeviceToDevice)) {

        printf("error: failed to copy data from global to constant, kdtree\n");

        abort();

    }

As another poster mentioned, initializing constant memory seems to be somewhat of a black art. Usually I do it through cudaMemcpyToSymbol, but that function only works from host to device. I already have the data on the card, however, and it Just Makes Sense ™ to copy it into constant memory. I would actually prefer to have a hlKdtree set only once at the beginning of my program, and run with that, however, I don’t think this is possible, since I share that memory between different libraries, and I believe that there is no behind-the-scenes linking of the cuda portions of the libraries.

So… how about it? Any way of copying from device to constant? Or am I stuck with keeping things on the host and copying to the device at launch? I guess with 4K bytes, maybe it’s not a big deal, but I’m still semi surprised that this won’t work.

Brian

AFAIK, there is no functions, which is copying data from device to constant memory…

Section 4.2.2.4 of the programming guide - constant memory can only be assigned from the host.

To give some idea why this restriction is in place, consider how constant memory is used - it can be read by all the threads in the grid (not just block) and it is cached. So, if device threads could write to constant memory, the cache would have to be updated for the benefit of all the threads in the grid. This gets you back to all the cache coherence complications that systems with multiprocessors had.

Paulius