writing kernel output to a texture copying global memory to texture bound array

Fellow CUDA enthusiasts,

I have read the following in the CUDA 2.1 FAQ.

Is it possible to write the results from a kernel directly to texture (for multi-pass algorithms)
Not currently, but you can copy from global memory back to the array (texture). Device to device memory copies are fast.

I am currently running CUDA 2.0 due to some device level limitations. I need to work with large images and process them through different kernels. The input image is an array bound to a texture. When processed through the first kernel I was hoping to write out the modified image (stored in global memory) into another texture. This would enable faster access since it forms the input to the next kernel. I have not been able to find a way to do this. When I bind the modified image array in global memory to a texture I get a run time error. I am pretty sure that I am doing something clumsy. Could anyone recommend a way to write the kernel output back a texture? Any references would be a great help too. I hope I am making my problem clear enough.

Thanks for your help!


How have you allocated your global memory??
Is it via cudaMalloc() or via cudaMallocArray()?

For former you have to use cudaBindTexture() and for latter you should use cudaBindTextureToArray()

You need to copy from global memory back to the array using cudaMemcpyToArray, e.g.:

cudaMemcpyToArray( d_array, 0, 0, d_srcImage, width * height * sizeof(float), cudaMemcpyDeviceToDevice);

This is applicable only if he wants to use “cudaBindTextureToArray”.

A better idea would be to allocate a 2D array (cudaMalloc with pitch) and use it for the intermediate output.

And then use "cudaMemcpy2DToArray(…,DeviceToDevice)’ and then use “cudaBindTextureToArray()”

Is it not? Correct me…