Pointers array

artemon · July 3, 2007, 11:21am

Hi,

How I can create array(int** mas) on Device and copy data in this array.

I’m try allocate memory:

size_t pitch=0;

int** devPtr = NULL;

cudaMallocPitch

                      (

                      (void**)&devPtr,

                      &pitch,

                      10,

                      2

                      );

But after pitch contains value = 10. Why?

Please, give me simple <img src=‘http://hqnveipbwb20/public/style_emoticons/<#EMO_DIR#>/crying.gif’ class=‘bbc_emoticon’ alt=‘:’(’ />

tmlam · July 5, 2007, 8:58pm

I second the request, please give us a simple example

downey · July 5, 2007, 11:01pm

Hi,

How I can create array(int** mas) on Device and copy data in this array.

I’m try allocate memory:
size_t pitch=0;

int** devPtr = NULL;

cudaMallocPitch

                      (

                      (void**)&devPtr,

                      &pitch,

                      10,

                      2

                      );
But after pitch contains value = 10. Why?

Please, give me simple <img src=‘http://hqnveipbwb20/public/style_emoticons/<#EMO_DIR#>/crying.gif’ class=‘bbc_emoticon’ alt=‘:’(’ />

[snapback]217570[/snapback]

Pitch is set by the cudaMallocPitch call, it isn’t something you set yourself. CUDA wants memory aligned on certain boundries, in this case probably 4 bytes. So cudaMallocPitch sets the width of the array that it allocated to properly align your allocation. This can be different than the width of the array that you specify you want. You also don’t want to use an int** just a int* I believe.

Here is an example:

size_t pitch;

int* devptr = NULL;

cudaMallocPitch((void**)&devptr, &pitch, 10, 2);

You can then use cudaMemcpy2D or cudaMemcpy to copy data into devptr. Use 11 instead of 10 and see what you get back for pitch.

Hope that helps.

tmlam · July 6, 2007, 2:41am

How would it be a 2D array if you use int* instead of int** ?

Pitch is set by the cudaMallocPitch call, it isn’t something you set yourself. CUDA wants memory aligned on certain boundries, in this case probably 4 bytes. So cudaMallocPitch sets the width of the array that it allocated to properly align your allocation. This can be different than the width of the array that you specify you want. You also don’t want to use an int** just a int* I believe.

Here is an example:
size_t pitch;

int* devptr = NULL;

cudaMallocPitch((void**)&devptr, &pitch, 10, 2);
You can then use cudaMemcpy2D or cudaMemcpy to copy data into devptr. Use 11 instead of 10 and see what you get back for pitch.

Hope that helps.

[snapback]218777[/snapback]

tmlam · July 6, 2007, 4:32am

and how would you copy the array onto the GPU ?

do you just use

(continuation from the above code)

cudaMemcpy2D(devptr, 0 ,hostptr, 0 , 10 ,2 , cudaMemcpyHostToDevice);

and when you call it in the kernel

do you just use

devptr[0]

if kernel is

kernel<<1,1>>(devPtr)

Thanks

downey · July 6, 2007, 3:04pm

Well, all the malloc calls just allocate linear memory. There is likely a reasonably good way to allocate a real C type 2D array using them but it requires some extra setup on your part. Really you can treat a 1D array like a 2D array using the appropriate addressing. Here is an example:

int A[20];
int B[10][2];

B[2][1] = A[2*2 + 1];

In this case A references the same location that B addresses. What you probably want is to allocate an array of pointers first. Then allocate a large array on the first pointer. Then go through all the other pointers and set them to the appropriate address in the first array. But generally it’s better to handle the 2D array as a 1D array and do the address calculation yourself. I also don’t think that CUDA gives you access to the pointers so creating a true dynamically created 2D array in the device memory would be impossible.

As for how would you copy data into the array, you will want to use cudaMemcpy or cudaMemcpy2D. As you can see they both take void*.

Hope that clears things up. I am no expert here so maybe there is a way to do what you want but I don’t know how you would do it.

artemon · July 8, 2007, 11:12am

Thanks

cirus · July 28, 2009, 5:59am

I am facing a little trouble in accessing elements of my 2D array in GPU.

Situation:

I am referring to following code from the official guide , but with my additions

//Host Code

float *d_src , *d2_src;

size_t pitch;

dim3 dimGrid(2,2);  //2D array of blocks.

dim3 dimBlock(4,1);  //Every block has 4 threads. In way I have 2 by 2 grid with each element as vector of length 4 float elements

N  = 16;

width = blockSize = 4;

height = N/blockSize;   //= 4 blocks

float **h2_src = NULL, h_src = NULL;

//Host pointer mem alloc

h_src = (float**)malloc(sizeof(float*) * height);

h2_src = (float**)malloc(sizeof(float*) * height);

for(int i = 0; i < height; i++)

{

   h_src[i] = (float)malloc(sizeof(float) * width);

   memset( h_src[i] , 0 , width * sizeof(float) );

h2_src[i] = (float)malloc(sizeof(float) * width);

   memset( h2_src[i] , 0 , width * sizeof(float) );

}

//Generate data. input is a 2D array of form h_src[4][4];

for(int i = 0; i < height; i++)

{

	for(int j = 0; j < width; j++)

	{

		  h_src[i*width + j] = i * width + j;

	}

}

//input[][] = { 0 , 1, 2, .... 15 }

//Memory Alloc at device

cudaMallocPitch( (void**) &d_src , &pitch , width * sizeof(float) , height);   //At debugging , the value of pitch is 64.

cudaMallocPitch( (void**) &d2_src , &pitch , width * sizeof(float) , height);   //At debugging , the value of pitch is 64.

cudaMemcpy2D( d_src, pitch , h_src , pitch , width * sizeof(float) , height ,  cudaMemcpyHostToDevice );

//GPU call

myKernel<<<dimGrid , dimBlock>>>(input , output, pitch, numBlocks, N , blockSize);

cudaMemcpy2D( h2_src, pitch , d2_src , pitch , width * sizeof(float) , height ,  cudaMemcpyHostToDevice ); 

//display all elements copied back from d2_src to h2_src

My kernel function

__global__ void myKernel(float* d_src ,  float* d2_src , size_t pitch , int N , int blockSize)

{

	float *row;

	  for(int j = 0; j < 2; j++)	// Here 2 is my grid height that is 2 by 2.		

	  {

		   row = (float*)( (char*)d_src + j * pitch );

		   for(int i = 0; i < 2; i++)

		   {

				d2_src[j*pitch + i] = row[i];

		   }

	  }

}

The result I am getting in h2_src is 0,1,2,3, 0…0 [trailing zeros].

I expected to get the original sequence viz <0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15>

What I guess after reading the pitch value as 64 is that it is taking 64/4 = 16 elements in a row.

Q1. Does this mean that CUDA is taking my 2 by 2 grid as a 1 by 4 grid ??? i.e elements 0…15 of the d_src is being treated as linear array? ?

Q2. Where else the problem can be? Is it in copying my host allocated source array to device allocated source array??

I am learning CUDA so need little help. Thanks.

Topic		Replies	Views
CUDA 2D Array Problem Need help to manipulate 2D arrays in CUDA CUDA Programming and Performance	4	26435	March 17, 2011
Problem with 2D memory copy using pitch CUDA Programming and Performance	6	6439	November 20, 2011
Problems with creating an array of Cuda pointers CUDA Programming and Performance	7	13531	April 20, 2009
Help with cuda 2d array CUDA Programming and Performance	6	7446	September 29, 2014
How to cudaMalloc two-dimensional array ? CUDA Programming and Performance	46	65904	September 7, 2023
Using 2D array in CUDA CUDA Programming and Performance	7	7249	July 21, 2015
2D Array Not Updated CUDA Programming and Performance	6	5232	May 4, 2010
help cudaMemcpy2d Trying to modify a 2d array on cuda device CUDA Programming and Performance	8	4975	September 11, 2010
Copying 2D array from host to device CUDA Programming and Performance	7	7208	July 27, 2010
Allocating an array of pitched arrays CUDA Programming and Performance	13	6485	September 30, 2011

Pointers array

Related topics