Using 2D array in CUDA

Hi all! I am new to CUDA. Can any of you give me an example of how to access the member elements of a 2D array which I declared using cudaMallocPitch? I have tried using:


for(int i=0;i<rowsize;i++)
int *row = (int *)((char )dev_arr+pitch_arri);
for(int j=0;j<colsize;j++)
int sample = row[j];

But this does not run. It returns error code 77. Can you please point out error/mistake in the above code snippet?

Thanks in advance!

CUDA arrays are opaque objects on the GPU, with data being reordered using a proprietary space filling curve.

The only way to access the values on the device would be by binding the CUDA array to a surface or texture. Surfaces also allow write access (without cache coherence), as far as I know.

NOTE: only a CUDA array created with the cudaArraySurfaceLoadStore flag, can be read and written via a surface object or surface reference.

EDIT: hmm, after looking at your code again I notice you indeed use 2D pitched memory, not cudaArrays. Hence please disregard what I wrote above.

You should provide a short, complete code if you want help.

Thank you all for your views.

Here is a sample code, similar to which I am working on.

void main()
	 int connrow=8,conncol=7;
	 int **CONNEC; //connrow*conncol 2D matrix-- sample values given below

 cudaError_t SOLVE(int **CONNEC,int connrow,int conncol)
	 cudaError_t cudaStatus;
	 int *dev_CONNEC;
	 size_t pitch_CONNEC;
        //Usually I find pitch_CONNEC=512
	cudaStatus = cudaGetLastError();
    if (cudaStatus != cudaSuccess) {
        fprintf(stderr, "1---kernel launch failed: %s\n", cudaGetErrorString(cudaStatus));
        return cudaStatus;
	getSOL<<<1,16>>>(dev_CONNEC,pitch_CONNEC,connrow,conncol); //parallelising according to connrow

 //kernel function
 __global__ getSOL(int *dev_CONNEC,size_t pitch_CONNEC,int connrow,int conncol)
	 int id = blockIdx.x*blockDim.x + threadIdx.x;

 //device function
 __device__ void XY(int *dev_CONNEC,size_t pitch_CONNEC,int connrow,int conncol,int id)
  int sg = 3;

  int cpt = 0;
  cpt += id; 

  for (int i=0;i<sg;i++)
	int *row_CONNEC = (int *)((char*)dev_CONNEC + cpt * pitch_CONNEC)+i;
        int nd = *row_CONNEC; printf ("\nnd = %d",nd);

Sample CONNEC matrix:
19 1 11 3 2 5 4
8 27 11 9 12 7 10
28 16 19 15 17 20 18
31 28 27 32 29 30 33
19 11 23 5 13 21 6
19 23 28 21 25 20 22
28 23 27 25 24 29 26
27 23 11 24 13 12 14

I usually get some random values when “nd” is printed. I would like to know if my way of accessing the array “dev_CONNEC” in the global memory is correct or not.

Nobody would have been able to discover your problem based on the original code you posted.

This is still not a complete code, since you haven’t bothered to show how you allocate and initialize the matrix associated with CONNEC. But we can make some headway.

CONNEC is a pointer to a pointer (**)

According to the documentation:

Does the 3rd parameter of cudaMemcpy2D expect a pointer-to-pointer argument?


  1. Store your host data using a single-pointer argument. It’s not the only way to do it, but it’s simplest.
  2. Do proper CUDA error checking on all CUDA API calls and all kernel calls. If you’re not sure what proper cuda error checking is, google “proper cuda error checking”
  3. run your code with cuda-memcheck
  4. Start with a kernel that just prints out the data you transferred to the device. Once you have that work, computations will be easier to tackle.
  5. Review a cuda sample code that demonstrates proper use of cudaMallocPitch/cudaMemcpy2D, such as the bilateralFilter sample.

Thank you very much for the suggestions. I tried to copy back the copied data from the device to the host (using another host pointer-to-pointer) and I found the data was actually copied successfully (the values matched). But I am facing problems while accessing the 2D array in the device. Is

int *row_CONNEC = (int *)((char*)dev_CONNEC + cpt * pitch_CONNEC)+i;

the correct way of accessing the 2D array (stored in the global memory)?

Please note that previously CONNEC pointer-to-pointer in the host was initialised like this:

int **CONNEC = (int **)malloc(connrow*sizeof(int *));
for(int i=0;i<connrow;i++)
   CONNEC[i] = (int *)malloc(conncol*sizeof(int));

You seem to have ignored my warnings about using a double pointer. You’re welcome to continue on your path if you wish. I’ve already pointed out that it won’t work (yes, I realize you claim it does. We can agree to disagree. I believe the documentation is on my side, as I’ve already linked to.) The double pointer allocation for CONNEC will not work with any cudaMemcpy function. There are not any cudaMemcpy functions that know how to chase a double pointer. Since you don’t seem to want to provide a complete code that someone else could test, I’ll leave it at that.

Hi! Thanks a lot for your advice. I converted all host arrays to 1-dimensional pointers and was successfully able to copy them to the device arrays. I am also getting the required final output.