You’re correct, unfortunately it’s not possible to write directly to 2D textures in CUDA currently. Device to device memcpys are very fast, however (70Gb/sec).
If you don’t need 2D access, it is possible to bind 1D textures to global memory directly and access them using tex1Dfetch(). The particles sample in the SDK does this.
Thank you, Simon and DenisR.
Let me summrize this article.
Question : How to write the result to 2D texture?
Answer : We cannot write the result to 2D texture.
Note : cudaMemcpyToArray() function is slow in 2D access.
I would say, to have 2D texture access you need to use cudaMemcpyToArray(), which can be quite slow compared to a normal Device2Device copy (3 vs 70 Gb/s). I believe there is now some explanation for this difference in speed in the thread I pointed to.