2D Array

gpugpu · May 18, 2008, 10:15pm

Hi All,

I’m a little confused how 2D arrays work in CUDA.

Based on the CUDA manual, we can allocate 2D arrays using cudaMallocPitch() and copy 2D arrays to CUDA arrays using cudaMemcpy2DToArray().
Are these so called 2D arrays really 2D?? I don’t see pointer to pointers anywhere in the manual …

Are we representing 2D array as 1D? If so, why do we need special copy functions for 2D???

Thanks.

casybaby · May 18, 2008, 11:23pm

In CUDA, it actually map 2D array to 1D array. If you have a 2D array in the host and you want to copy this to GPU, to use mallocpitch and memcpy2dtoarray makes the copy faster. If you use the memcpy 1D array function to copy a 2D array to GPU, each row in the array will be copied one by one in a loop. This wastes your time. I feel mallocpitch a little bit confusing too, so I always map a 2D array into 1D in CPU first before I copy it to GPU.

gpugpu · May 19, 2008, 12:08am

In case of texturing, is texture cache 2D locality is then correctly exploited if 2D arrays are mapped to 1D arrays?

seibert · May 19, 2008, 1:22am

It’s generally thought that 2D textures are stored in GPU memory using a space-filling curve (see “Hilbert Curve” for an example of such a beast. This means that when the texture cache fetches a block of contiguous data from global memory, the texture elements are likely to be close together in 2D.

If you map a 2D array to 1D, then you will have good performance if you read the array in the same row-major or column-major order you used when you flattened it to 1D. If you are careful, you can also read a rectangular window of special sizes with reasonable performance (which is what the matrix transpose kernel has to do). More complex access patterns will perform poorly though. You have to decide if your problem warrants the extra complexity of texture references.

(The special 2D array mapping might also explain why you can’t read or write such arrays from the device without using a texture reference.)

Simon_Green · May 22, 2008, 4:29pm

It’s more like a Z-curve, actually (or used to be):
[url=“Z-order curve - Wikipedia”]http://en.wikipedia.org/wiki/Z-order_(curve)[/url]

seibert · May 22, 2008, 5:00pm

Ah, that makes much more sense. Looks a lot quicker to compute the addresses.

gpugpu · May 23, 2008, 11:56pm

It’s generally thought that 2D textures are stored in GPU memory using a space-filling curve (see “Hilbert Curve” for an example of such a beast. This means that when the texture cache fetches a block of contiguous data from global memory, the texture elements are likely to be close together in 2D.

If you map a 2D array to 1D, then you will have good performance if you read the array in the same row-major or column-major order you used when you flattened it to 1D. If you are careful, you can also read a rectangular window of special sizes with reasonable performance (which is what the matrix transpose kernel has to do). More complex access patterns will perform poorly though. You have to decide if your problem warrants the extra complexity of texture references.

(The special 2D array mapping might also explain why you can’t read or write such arrays from the device without using a texture reference.)

[snapback]379158[/snapback]

Is it possible not to map a 2D array to 1D AND use 2D textures??? As I saw in the simpleTexture example code, data is first mapped to 1D and copied to cuda array using cudaMemcpyToArray() not cudaMemcpy2DToArray() …

seibert · May 24, 2008, 1:19am

I’m not sure what you mean here. Many libraries natively store 2D arrays in 1D in row-major order (or column-major if it is FORTRAN) already. How are you filling the 2D array on the host side?

gpugpu · May 24, 2008, 7:53pm

Thank you very much for helping me out seibert.

I saw in the GPU Gems that when a 1D array is binded to a 2D texture, it is chopped and converted to 2D. That really confuses me, I first need to map a 2D array to 1D array, then it will be converted to 2D again to be mapped to the 2D texture??? Is it possible not to convert my data to 1D when copying it to texture?

I once tried to store my data in a dynamically allocated 2D array in the host code by means of pointer-to-pointers and copy that to a cuda array using cudaMemcpy2DToArray() but it seemed not to work. Is it supposed to be the way?

My other question is:

Now, I’m filling my array in row-major order, so I wanna know, if for instance an element in the array has indexes (i,j) (and array is flattened and bind to texture) is it accessible in the texture by tex2D(i,j) … or the indexing changes? OR if I fill my array in some other order, I need to know the ordering that data is filled in the texture and how tex2D deals with indexes. Is that right?

Thanks Much.

MisterAnderson42 · May 25, 2008, 12:09am

Pointer to pointer methods are challenging to do in CUDA, I wouldn’t recommend it.

Anywhere the CUDA programming guide refers to 2D device memory, it is really referring to just linear 1D memory addressed like so: memory[j*pitch_in_elements + i] (row major). pitch_in_elements is pitch/sizeof(element) where pitch is the one returned by cudaMallocPitch().

Memory physically in a CUDA array (allocated by cudaMallocArray) is stored differently, as previously mentioned. You can only write to such arrays by using cudaMemcpy2DToArray which handles that. The input memory you copy from is in row major order as listed above.

live2inspire · July 8, 2009, 2:56pm

sorry to dig up such an old thread… but I’m having a bit of an issue with 2D arrays. Currently i have a 2-D occupancy grid that I’ve stored as unsigned char pMap on the host side. It is a 1D array (row major). I am trying trying to copy this to a texture… can i do this? when i run tex2D(mapTexture,int(checkY),int(checkX)) on the device side, (where checkY and checkX are two coordinates in my map… i get something different from if I were to run pMap[checkY*mapNcols + checkX)… i’ve been been pulling my hair out on this one for a couple hours… can anyone spot the problem? thanks in advance

[codebox]void sendMapToGPU(unsigned char* pMap, int mapMrows, int mapNcols)

{

cudaArray     *gpMap ;

cudaChannelFormatDesc mapChannelDesc = cudaCreateChannelDesc();

CUDA_SAFE_CALL(cudaMallocArray(&gpMap, &mapChannelDesc, mapNcols, mapMrows));

CUDA_SAFE_CALL(cudaMemcpy2DToArray(gpMap,0,0,pMap,mapNcols,m

apNcols,mapMrows,

                                 cudaMemcpyHostToDevice));

// map 2D array to mapTexture …

mapTexture.addressMode[0] = cudaAddressModeClamp ;

mapTexture.addressMode[1] = cudaAddressModeClamp ;

mapTexture.filterMode = cudaFilterModePoint ;

mapTexture.normalized = false;

CUDA_SAFE_CALL(cudaBindTextureToArray(mapTexture, gpMap,mapChannelDesc));

}[/codebox]

sigismondo · July 8, 2009, 3:23pm

Arrays of arrays are not 2D arrays. They just look like because of the C syntax.

I am convinced I am going to go for a crusade to ban them :D
See these threads:

[url=“The Official NVIDIA Forums | NVIDIA”]The Official NVIDIA Forums | NVIDIA
[url=“The Official NVIDIA Forums | NVIDIA”]http://forums.nvidia.com/index.php?showtopic=100647[/url]

live2inspire · July 8, 2009, 3:49pm

I don’t quite understand what you mean. My pMap vector is a 1D array of length(mapNcols*mapNrows)… isn’t that what i’m supposed to do? My question was can cudaMemcpy2DToArray turn pMap into a 2D cudaArray? sorry for sounding n00b.

Nico · July 8, 2009, 4:41pm

I would imagine you could. The memcpy2D functions are there so that you can efficiently copy rectangular areas of data between ‘2D’ arrays which may or may not have a different pitch.

N.

sigismondo · July 8, 2009, 5:26pm

Ow, my fault - I have been misguided by the original post (“I don’t see pointer to pointers anywhere in the manual …”) - I have read better your post now… better to create a new thread next time! And to me to read better your posts :) sorry.

(wrong again - edited - I will look better later - I have lo escape!

I think that in your case you shoud not look at pMap[checkYmapNcols + checkX] but at pMap[checkYpitch + checkX], since cudaMallocPitch() (you have used it, isn’t it?) can insert padding into matrix rows to optimize for perfomances.)

RyuKa · July 6, 2011, 7:48am

1 - Can I use cudaMemcpy2DToArray to copy a 2D array of the host into a 1D array on the device ?

2 - is it possible to store an array of arrays of different size. Each d_Data[n] would have a different size. In this case I don’t want to copy from the host, just to store results of a kernel in an array of arrays.

Edit : for the 2nd probleme, I did

float** d_Data_selection=0;

cutilSafeCall(cudaMalloc((void**)&d_Data_selection, nROI*sizeof(float*)));

and later :

for(int nrs =0; nrs<nROI; nrs++)

	{

		cutilSafeCall( cudaMalloc((void **)&d_Data_selection[nrs],   data0H   * data0W   * sizeof(float)) );

	}

But it doesn’t work.

I also tried to follow this topic : The Official NVIDIA Forums | NVIDIA

float** d_Data_selection=0;

cutilSafeCall(cudaMalloc((void**)&d_Data_selection, nROI*sizeof(float*)));[/

float* d_Data_selection2[100];

	for(int nrs =0; nrs<nROI; nrs++)

	{

		cutilSafeCall( cudaMalloc((void **)&d_Data_selection2[nrs],   data0H   * data0W   * sizeof(float)) );

		PtrSet(d_Data_selection,d_Data_selection2,nrs);

	}

Same here.

StevenMarkFord · January 20, 2012, 11:13pm

1 - Can I use cudaMemcpy2DToArray to copy a 2D array of the host into a 1D array on the device ?

2 - is it possible to store an array of arrays of different size. Each d_Data[n] would have a different size. In this case I don’t want to copy from the host, just to store results of a kernel in an array of arrays.

Edit : for the 2nd probleme, I did
float** d_Data_selection=0;

cutilSafeCall(cudaMalloc((void**)&d_Data_selection, nROI*sizeof(float*)));
and later :
for(int nrs =0; nrs<nROI; nrs++)

	{

		cutilSafeCall( cudaMalloc((void **)&d_Data_selection[nrs],   data0H   * data0W   * sizeof(float)) );

	}
But it doesn’t work.

I also tried to follow this topic : The Official NVIDIA Forums | NVIDIA
float** d_Data_selection=0;

cutilSafeCall(cudaMalloc((void**)&d_Data_selection, nROI*sizeof(float*)));[/

float* d_Data_selection2[100];

	for(int nrs =0; nrs<nROI; nrs++)

	{

		cutilSafeCall( cudaMalloc((void **)&d_Data_selection2[nrs],   data0H   * data0W   * sizeof(float)) );

		PtrSet(d_Data_selection,d_Data_selection2,nrs);

	}
Same here.

The issue is that you trying to call cudaMalloc on memory which already belongs to the device. I posted a way to allocate a jagged 2D array on my blog here: http://www.stevenmarkford.com/allocating-2d-arrays-in-cuda/

Topic		Replies	Views
help cudaMemcpy2d Trying to modify a 2d array on cuda device CUDA Programming and Performance	8	4977	September 11, 2010
Passing a multidimensional array to kernel how to allocate space in host and pass to device? CUDA Programming and Performance	12	16210	November 22, 2014
2D Array Not Updated CUDA Programming and Performance	6	5244	May 4, 2010
How to cudaMalloc two-dimensional array ? CUDA Programming and Performance	46	66549	September 7, 2023
The secret of cudaArray CUDA Programming and Performance	8	10455	September 26, 2008
Unable to correctly use a 2D CUDA array with a texture object CUDA Programming and Performance cuda	2	328	February 25, 2024
Basic CUDA texture issue CUDA Programming and Performance	5	10203	June 25, 2009
Cuda Ring Buffer (Help?) CUDA Programming and Performance	7	5474	July 25, 2007
Help with cuda 2d array CUDA Programming and Performance	6	7452	September 29, 2014
Simplest texture 2D examples CUDA Programming and Performance	11	11432	March 26, 2019

2D Array

Related topics