2D Array

Hi All,

I’m a little confused how 2D arrays work in CUDA.

Based on the CUDA manual, we can allocate 2D arrays using cudaMallocPitch() and copy 2D arrays to CUDA arrays using cudaMemcpy2DToArray().
Are these so called 2D arrays really 2D?? I don’t see pointer to pointers anywhere in the manual …

Are we representing 2D array as 1D? If so, why do we need special copy functions for 2D???


In CUDA, it actually map 2D array to 1D array. If you have a 2D array in the host and you want to copy this to GPU, to use mallocpitch and memcpy2dtoarray makes the copy faster. If you use the memcpy 1D array function to copy a 2D array to GPU, each row in the array will be copied one by one in a loop. This wastes your time. I feel mallocpitch a little bit confusing too, so I always map a 2D array into 1D in CPU first before I copy it to GPU.

In case of texturing, is texture cache 2D locality is then correctly exploited if 2D arrays are mapped to 1D arrays?

It’s generally thought that 2D textures are stored in GPU memory using a space-filling curve (see “Hilbert Curve” for an example of such a beast. This means that when the texture cache fetches a block of contiguous data from global memory, the texture elements are likely to be close together in 2D.

If you map a 2D array to 1D, then you will have good performance if you read the array in the same row-major or column-major order you used when you flattened it to 1D. If you are careful, you can also read a rectangular window of special sizes with reasonable performance (which is what the matrix transpose kernel has to do). More complex access patterns will perform poorly though. You have to decide if your problem warrants the extra complexity of texture references.

(The special 2D array mapping might also explain why you can’t read or write such arrays from the device without using a texture reference.)

It’s more like a Z-curve, actually (or used to be):

Ah, that makes much more sense. Looks a lot quicker to compute the addresses.

Is it possible not to map a 2D array to 1D AND use 2D textures??? As I saw in the simpleTexture example code, data is first mapped to 1D and copied to cuda array using cudaMemcpyToArray() not cudaMemcpy2DToArray() …

I’m not sure what you mean here. Many libraries natively store 2D arrays in 1D in row-major order (or column-major if it is FORTRAN) already. How are you filling the 2D array on the host side?

Thank you very much for helping me out seibert.

I saw in the GPU Gems that when a 1D array is binded to a 2D texture, it is chopped and converted to 2D. That really confuses me, I first need to map a 2D array to 1D array, then it will be converted to 2D again to be mapped to the 2D texture??? Is it possible not to convert my data to 1D when copying it to texture?

I once tried to store my data in a dynamically allocated 2D array in the host code by means of pointer-to-pointers and copy that to a cuda array using cudaMemcpy2DToArray() but it seemed not to work. Is it supposed to be the way?

My other question is:

Now, I’m filling my array in row-major order, so I wanna know, if for instance an element in the array has indexes (i,j) (and array is flattened and bind to texture) is it accessible in the texture by tex2D(i,j) … or the indexing changes? OR if I fill my array in some other order, I need to know the ordering that data is filled in the texture and how tex2D deals with indexes. Is that right?

Thanks Much.

Pointer to pointer methods are challenging to do in CUDA, I wouldn’t recommend it.

Anywhere the CUDA programming guide refers to 2D device memory, it is really referring to just linear 1D memory addressed like so: memory[j*pitch_in_elements + i] (row major). pitch_in_elements is pitch/sizeof(element) where pitch is the one returned by cudaMallocPitch().

Memory physically in a CUDA array (allocated by cudaMallocArray) is stored differently, as previously mentioned. You can only write to such arrays by using cudaMemcpy2DToArray which handles that. The input memory you copy from is in row major order as listed above.

sorry to dig up such an old thread… but I’m having a bit of an issue with 2D arrays. Currently i have a 2-D occupancy grid that I’ve stored as unsigned char pMap on the host side. It is a 1D array (row major). I am trying trying to copy this to a texture… can i do this? when i run tex2D(mapTexture,int(checkY),int(checkX)) on the device side, (where checkY and checkX are two coordinates in my map… i get something different from if I were to run pMap[checkY*mapNcols + checkX)… i’ve been been pulling my hair out on this one for a couple hours… can anyone spot the problem? thanks in advance

[codebox]void sendMapToGPU(unsigned char* pMap, int mapMrows, int mapNcols)


cudaArray     *gpMap ;

cudaChannelFormatDesc mapChannelDesc = cudaCreateChannelDesc();

CUDA_SAFE_CALL(cudaMallocArray(&gpMap, &mapChannelDesc, mapNcols, mapMrows));




// map 2D array to mapTexture …

mapTexture.addressMode[0] = cudaAddressModeClamp ;

mapTexture.addressMode[1] = cudaAddressModeClamp ;

mapTexture.filterMode = cudaFilterModePoint ;

mapTexture.normalized = false;

CUDA_SAFE_CALL(cudaBindTextureToArray(mapTexture, gpMap,mapChannelDesc));


Arrays of arrays are not 2D arrays. They just look like because of the C syntax.

I am convinced I am going to go for a crusade to ban them :D
See these threads:


1 Like

I don’t quite understand what you mean. My pMap vector is a 1D array of length(mapNcols*mapNrows)… isn’t that what i’m supposed to do? My question was can cudaMemcpy2DToArray turn pMap into a 2D cudaArray? sorry for sounding n00b.

I would imagine you could. The memcpy2D functions are there so that you can efficiently copy rectangular areas of data between ‘2D’ arrays which may or may not have a different pitch.


Ow, my fault - I have been misguided by the original post (“I don’t see pointer to pointers anywhere in the manual …”) - I have read better your post now… better to create a new thread next time! And to me to read better your posts :) sorry.

(wrong again - edited - I will look better later - I have lo escape!

I think that in your case you shoud not look at pMap[checkYmapNcols + checkX] but at pMap[checkYpitch + checkX], since cudaMallocPitch() (you have used it, isn’t it?) can insert padding into matrix rows to optimize for perfomances.)

1 - Can I use cudaMemcpy2DToArray to copy a 2D array of the host into a 1D array on the device ?

2 - is it possible to store an array of arrays of different size. Each d_Data[n] would have a different size. In this case I don’t want to copy from the host, just to store results of a kernel in an array of arrays.

Edit : for the 2nd probleme, I did

float** d_Data_selection=0;

cutilSafeCall(cudaMalloc((void**)&d_Data_selection, nROI*sizeof(float*)));

and later :

for(int nrs =0; nrs<nROI; nrs++)


		cutilSafeCall( cudaMalloc((void **)&d_Data_selection[nrs],   data0H   * data0W   * sizeof(float)) );


But it doesn’t work.

I also tried to follow this topic : http://forums.nvidia.com/index.php?showtopic=100647

float** d_Data_selection=0;

cutilSafeCall(cudaMalloc((void**)&d_Data_selection, nROI*sizeof(float*)));[/

float* d_Data_selection2[100];

	for(int nrs =0; nrs<nROI; nrs++)


		cutilSafeCall( cudaMalloc((void **)&d_Data_selection2[nrs],   data0H   * data0W   * sizeof(float)) );



Same here.

The issue is that you trying to call cudaMalloc on memory which already belongs to the device. I posted a way to allocate a jagged 2D array on my blog here: http://www.stevenmarkford.com/allocating-2d-arrays-in-cuda/