Writing to an array through kernels


What are the ways to go, if i want to write to an array allocated by cudaMallocArray through my kernels?

For example, doing this:
struct cudaArray array;
cudaMallocArray(&array, cudaCreateChannelDesc(), 100, 100);

Now the only way to use it, is to do memcpys like cudaMemcpyToArray() (read/write) or to bind it to a texture (read), right?

It seems best way doing it, is to operate on an extra buffer created by cudaMalloc2D() and copy the data afterwards. Other possibilities?

I’m currently doing some image processing and i want to make use of texfetch advantages like caching, clamping and interpolation. So the output of my filter functions should return arrays to use them as input in subsequent filter functions.

Is it a good idea to do an extra cudaMemcpyToArray instead of doing without texfetch? Also consider doing interpolation and clamping manually.

Thanks in advance!

Correct, there is no way for kernels to write directly to arrays currently. The only way is to write to a regular buffer and then memcpy to the array.

Device-to-device memcpys are not optimal in the beta, but will be optimized in the next release.