My CUDA program has large arrays containing static data. This data should be available for all functions in the kernel. I tried to use global memory for it, but so far I failed to move it on the GPU. I always get the CUDA error “invalid argument.” after calling cudaMemcpyToSymbol(…). Is it correct to use cudaMemcpyToSymbol here? What am I doing wrong?
OK I didn’t bother to read the rest of your code. The cudaMalloc call you are doing is wrong, and that is the source of the problems. cudaMallloc() onto a host symbol, copy to the device address it holds, then copy the address it holds onto your device symbol.
OK I didn’t bother to read the rest of your code. The cudaMalloc call you are doing is wrong, and that is the source of the problems. cudaMallloc() onto a host symbol, copy to the device address it holds, then copy the address it holds onto your device symbol.
then copy the address it holds onto your device symbol.
The device symbol must point to the address where the memory is allocated by cudaMalloc(…), however in c it’s not possible to modify the address of a pointer once declared… confused
then copy the address it holds onto your device symbol.
The device symbol must point to the address where the memory is allocated by cudaMalloc(…), however in c it’s not possible to modify the address of a pointer once declared… confused
cutilCheckMsg("Kernel execution failed");
cudaMemcpy( b, b_d, sizeof(B), cudaMemcpyDeviceToHost );
int succes = 1;
for( int i = 0; i < ELEMS(a); i++) { printf( "%d %d %d\n", i, a[i], b[i] ); if( a[i] != b[i] ) succes=0; }
puts( succes ? "Passed" : "Failed" );
}[/codebox][/font]
I’ve used constant memory here. If your arrays do not fit in constant memory, you can put them in global memory, using the normal cudaMalloc() and cudaMemcpy() as used in (all) the SDK examples. I think constant memory is faster.
The array of errorstrings is a bit unnecessary, should have used cudaGetErrorString() as you did.