The secret of cudaArray

Hi everybody. :D
we usually use CUDAarray in 2D, 3D programming, specially when we want to use with texture.
I read cuda programming manual for many times, try to find the structure of CUDAarray in the header file, and on the internet.
but nothings can made me clearly about CUDAarray structure.:argh:
does CUDAarray structure is a secret of NVIDIA. :(
when we use cudaMallocArray to allocate a matrix. I think that the matrix will be allocate on the global memory is it true?

I try to get the structure of CUDAarry by pointer casting but i get nothings clearly. :(
does anybody knows about structure, please show me. :D
thank you very much. :D
sorry about my English.

I am wondering that CUDAarray is an array of pointer.
we can not directly access CUDAarray, the way to access is binding with texture or copy to other memory.
I think that the data copied from other memory to CUDAarray would be allocate in global memory, and CUDAarray holds these addresses.
with my thinking, i tried to use pointerOfPointer for casting the CUDAarray, but the returns data not told me anythings. :no:
so i think that i should change my mind to other method. :)
can anybody help me to clear this secret.

This is my code

h_data = new float[dataSize];

	for(int i = 0; i < dataSize; i++)


  h_data[i] = (float)9;


 cudaArray* cu_array;

cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc(32, 0, 0, 0, cudaChannelFormatKindFloat);

 CUDA_SAFE_CALL( cudaMallocArray( &cu_array, &channelDesc, width, height ));

CUDA_SAFE_CALL( cudaMemcpyToArray( cu_array, 0, 0, h_data, size, cudaMemcpyHostToDevice));

unsigned int** ptrCuda;

ptrCuda = new unsigned int*[height];

printf("\n Get cu_array address by pointer ptrCuda \n");

printf("%u ", ptrCuda);

printf("\n Get cu_array's member address  by pointer ptrCuda\n");

for(int i = 0; i < height * width; i++)


	printf("%u , ",(ptrCuda + i));



printf("\n Get cu_array's member data  by pointer ptrCuda \n");

for(int i = 0; i < height * width; i++)


	printf("%u , ",*(ptrCuda + i));


The way in which a 2D CUDAarray is stored in global memory for spatial access is z-order curve.

Please see this link.

Basically, if your 2D array is like

1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18
19 20 21 22 23 24 25 26 27 … and so on…

They might possibly stored in a different pattern in global memory to facilitate spatial acces…

probably like:
1 2 10 11 19 20 3 4 12 13 21 22 and so on…

May b, some1 more authoritative should be able to answer you. Well, if you follow the URL above, the same question has been answered by Simon Green from NVIDIA. So, atleast the Z-order pattern should be authentic…

The best thing to do is to probably allocate and populate a CUDAArray and somehow get the pointer to global memory and read back from global memory directly (instead of texture) and see how data elements are arranged… I am not sure if this is possible… If some1 has bandwidth to do this, kindly post your results here. Thanks.

Yes, textures (cudaArrays) are arranged in a special layout in memory to optimize hardware performance.

The reason that we don’t expose this internal format is that it may change in future hardware (it has in the past).

P.S. Go easy on the smileys!

Thank you Sarnath and Simon Green :D
I think that your suggest is very useful for me. :)
by the way, I read a lot of your posts and your replies in this forum, your idea help me to understand more about cuda. :yes:

I had too stupid. :thumbsdown:
I was try to read GPU memory data from Host code.
But in my mind, I would like directly to observe data in the GPU memory, “some time for debug”.
can we have any method directly to observe data in the GPU memory without using cudaCopy or some functions like that.?
I am very happy if anyone knows this method.
if we can not directly observe data in GPU memory, I think that i should giver up my stupid thinking. :">

Dont understand u much…


Very few GPUs support a simultaneous cudaMemcpy when a kernel is executing…So, you can most likely watch the results only after the kernel is finished.

There’s no way to read GPU allocated data by the host without copying it to the host. You could try debugging in emulation mode - then the data would really get allocated on the host side and you could read it directly, but you wouldn’t see hardware specific behaviours.