copy device memory to constant memory


I tried to copy some kernel calculated values to constant memory, because I want to reuse them in other kernels.

Unfortunately I get segmentation faults.

Some little code snippet:

#include <stdio.h>

__constant__ float testVar[1];

__global__ void test_kernel( float *testVar_d )


	testVar_d[ 0 ] = 7.0f;


__global__ void output_kernel( float *out_d )


	out_d[ 0 ] = testVar[0];


int main()


	float *testVar_d;

	cudaMalloc( (void**)&testVar_d, sizeof( float ) );

	float *out_d;

	cudaMalloc( (void**)&out_d, sizeof( float ) );

	float h = 1;

	cudaMemcpyToSymbol( testVar, &h, sizeof( float ) );

	dim3 dimGrid( 1 );

	dim3 dimBlock( 1 );

	test_kernel<<< dimGrid, dimBlock >>>( testVar_d );


	cudaMemcpyToSymbol( testVar, testVar_d, sizeof( float ), cudaMemcpyDeviceToDevice );

	output_kernel<<< dimGrid, dimBlock >>>( out_d );

	float result = 2;

	cudaMemcpy( &result, out_d, sizeof( float ), cudaMemcpyDeviceToHost );

	printf( "%f\n", result );

	return 0;


This programm should print 7, if everything works fine…

Any ideas how to copy from global memory to constant without making a copy to host memroy?

Thx ;o)

you cannot copy from device memory to const memory directly
you need first copy value from device memory to system memory, and then from system memory to symbol (const memory) :D

I think you are right.

But the reference manual sais…

I hope somebody writing this reference manual is reading that.

When I can use a device to device copy and I am not allowed to copy to constant device memory, it should be noticed.

Yes, i got the same problem.

I was trying to copy data from device[Global memory] to constant memory .

But it was never work.

so I think that we can only copy data from host memory to constant memory.

I think that in the “reference manual” has this mistake.

I got a hint!

Definition after includes…

__contant__ float constant_device_variable

Host code…

float *device_pointer

cudaMalloc( (void**)&device_pointer, device_pointer_size );


cudaMemcpyToSymbol( "constant_device_variable", device_pointer, sizeof( float ), 0, cudaMemcpyDeviceToDevice );

This version works.

Important is to set a 0-offset. Perhaps cudaMemcpyDeviceToDevice is misinterpreted by nvcc as an offset, when there are three arguments in the function call.

edit: perhaps cudaMemcpyHostToDevice is equal to zero. If this assumption is true, the reference manual is ok but the sample code in the programming guide is misleading.