I tried to copy some kernel calculated values to constant memory, because I want to reuse them in other kernels.
Unfortunately I get segmentation faults.
Some little code snippet:
__constant__ float testVar;
__global__ void test_kernel( float *testVar_d )
testVar_d[ 0 ] = 7.0f;
__global__ void output_kernel( float *out_d )
out_d[ 0 ] = testVar;
cudaMalloc( (void**)&testVar_d, sizeof( float ) );
cudaMalloc( (void**)&out_d, sizeof( float ) );
float h = 1;
cudaMemcpyToSymbol( testVar, &h, sizeof( float ) );
dim3 dimGrid( 1 );
dim3 dimBlock( 1 );
test_kernel<<< dimGrid, dimBlock >>>( testVar_d );
cudaMemcpyToSymbol( testVar, testVar_d, sizeof( float ), cudaMemcpyDeviceToDevice );
output_kernel<<< dimGrid, dimBlock >>>( out_d );
float result = 2;
cudaMemcpy( &result, out_d, sizeof( float ), cudaMemcpyDeviceToHost );
printf( "%f\n", result );
This programm should print 7, if everything works fine…
Any ideas how to copy from global memory to constant without making a copy to host memroy?
you cannot copy from device memory to const memory directly
you need first copy value from device memory to system memory, and then from system memory to symbol (const memory) :D
I think you are right.
But the reference manual sais…
I hope somebody writing this reference manual is reading that.
When I can use a device to device copy and I am not allowed to copy to constant device memory, it should be noticed.
Yes, i got the same problem.
I was trying to copy data from device[Global memory] to constant memory .
But it was never work.
so I think that we can only copy data from host memory to constant memory.
I think that in the “reference manual” has this mistake.
I got a hint!
Definition after includes…
__contant__ float constant_device_variable
cudaMalloc( (void**)&device_pointer, device_pointer_size );
cudaMemcpyToSymbol( "constant_device_variable", device_pointer, sizeof( float ), 0, cudaMemcpyDeviceToDevice );
This version works.
Important is to set a 0-offset. Perhaps cudaMemcpyDeviceToDevice is misinterpreted by nvcc as an offset, when there are three arguments in the function call.
edit: perhaps cudaMemcpyHostToDevice is equal to zero. If this assumption is true, the reference manual is ok but the sample code in the programming guide is misleading.