cudaArray, used size and layout

Is there any documentation on the amount of memory that is occupied by allocating a cudaArray as a function of type/width/height/depth? If not, does there exist a function that returns the result (even without explaining how it got there)? Does the result depend on the used cuda driver or the GPU?

I often need to allocate large cudaArray:s and would like to know in advance whether they will fit in the free memory. I have been experimenting for a while … for a float-cudaArray the size seems to be at least rounded up to the next multiple of 2^19 elements, but sometimes it’s more. I can also see, that width/height/depth are not interchangeable.

Out of curiosity, is there any official documentation of the internal layout of a cudaArray (clearly I’m not indexing into it or such…)? There are people speculating about the use of a spacefilling curve, but I don’t see any official statements about that.

cheers, Lukas Wirz

I’m not aware of documentation on these subjects. cudaArray is generally billed as “opaque”.

Not that I know of. I think you could probably easily enough lower-bound the size, don’t know if that is useful.

Not that I know of.

You can request changes to the CUDA documentation by filing a bug.

The lower bound is quite easy: The allocated size of a float/128x128x128 cudaArray has zero overhead. An upper bound is much harder, and for very small arrays the relative overhead is huge (the smallest allocated size for a float cudaArray is 2MB), but that is not a problem for us.
For random large-ish arrays with sizes of 500-1000MB we typically see overheads of 3%-10%. We could of course estimate a strict upper bound, but that would often be far too large and make us underutilise the existing memory.