Currently I am working on a volume raycaster for CUDA. At the moment it uses a single pass for every ray. But now I want to extend this to a multi-pass raycaster. Therefore I want to write several values to global device memory for every ray, so that in the next pass every ray (thread) can access these values again.
My volume data (158MB) is copied to a 3D texture array. Now when I write to the allocated global linear device memory, values in the texture array are overwritten, resulting in a 3D visualization of the allocated memory :) Since this is not really what I want to visualize, I was wondering if anyone else has experienced a similar problem, or may have any tips or suggestions on writing to (a temporary) device memory from the CUDA kernel and then if possible reusing this data later in the same kernel. Or is there a better way to share data not only between threads but also between blocks?
I’m using an 8800GT with 1GB of memory and CUDA 2.0 on windows XP professional 32bit.
The only way to share data between blocks without race conditions is to let the kernel call complete and launch another kernel.
If you need to update values in a 3D texture cudaArray, then you must double buffer.
allocate d_data in linear memory
allocate d_array as a cudaArray
initialize
call kernel pass1 which reads d_array and writes to d_data
cudaMemcpyToArray(d_array, d_data)
call kernel pass2 which reads d_array and writes d_data
....