Getting address of cudaArray

For an application I’m writing I need to generate an image on the GPU, then use it as a texture in the next kernel. For maximum speed it is recommended to use CUDA arrays for this.

It seems that I need to allocate the texture then use cudaMemcpyToArray to do a copy, given the current API, even though the texture is already in GPU memory.

Why isn’t it possible to get the address of an array directly so that I can write to it in a kernel? It seems this shouldn’t be a problem unless someone writes and reads from a texture in the same kernel, and would save a redundant copy in this case.

Because cudaArray’s store data in a special way which makes them fast for 2D caches. The documentation doesn’t state what this special order is, so you can’t access it. Even if we knew how they are stored, you wouldn’t be able to coalesce writes to it anyways.

memcpyToArray is fast (70GB/s), so is it really that huge of a bottleneck for you?

Indeed, the copying is not a bottleneck, as it is very fast. The extra memory used might be a problem, but I don’t think so.

I was just wondering why and this seems to be a good reason. Although I hope NVidia will make public hardware details like the texture format in the future.

The 2D cache is probably some kind of trade secret. If I had to guess, I’d say they are using a Hilbert, or some other space-filling curve approach to store the data since my research into multi-dimensional data locality for use in my work turned these techniques up as being the best.

Well some other graphics hardware (R300) uses tile based texturing. Each tile being 16x16 or 32x32, whatever is easy to load, and these tiles are laid out in normal left-to-right, top-to-bottom order. Hilbert and other space filling curves are very nice and elegant, but less practical for hardware implementation (as far as I know), as querying where a certain x,y is located is quite an expensive operation.

I don’t think secrecy really is the reason, as patents provide sufficient protection.