Help with cuda 2d array

calvar · September 28, 2014, 4:55pm

Good day,

I am learning CUDA at the moment, and I am trying to use a kernel that modifies the elements of a 2d matrix. This is how I allocate the memory for the array in the device and copy the matrix:

int *d_A;
size_t pitch;

cudaMallocPitch((void**)&d_A, &pitch, sizeof(int)*cols, rows);

cudaMemcpy2D(d_A, pitch, A, sizeof(int)*cols, sizeof(int)*cols, rows, cudaMemcpyHostToDevice);

where cols and rows are the number of columns and rows in the matrix A.

However when I try to modify the elements in d_A I get segmentation fault, and if I print the elements

__global__ void kernel(int *d_A, size_t pitch, int rows, int cols){
  //compute the row
  int r = blockIdx.y*blockDim.y+threadIdx.y;
  //compute the column
  int c = blockIdx.x*blockDim.x+threadIdx.x;

  if((r < rows) && (c < cols)){
  //   // update the pointer to point to the beginning of the row
    int *Row = (int*)((char*)d_A + r*pitch);
    int elem = Row[c];
    printf("%d ", elem);
  }
}

I do not get the values stored in the original matrix.

I thank you in advance for any help you can give me.

Robert_Crovella · September 28, 2014, 7:36pm

You would probably need to provide a complete code, along with the compile command you are using, and the device you are running on. Also, you don’t appear to be doing any proper cuda error checking. You should add that to your code any time you are having trouble with a CUDA code. There is nothing wrong with the code you’ve shown. I built a sample app out of it, here is a fully worked example:

[url]http://pastebin.com/9YYEUkfu[/url]

If you provide a fully worked example like that, showing what you are doing, I’m sure someone can help.
Don’t forget to add proper cuda error checking (google: “proper cuda error checking”)

calvar · September 28, 2014, 11:15pm

Hi,

Thank you for your answer txbob.

Here’s the complete code:

#define BLOCK_WIDTH 16


__global__ void kernel(int *d_A, size_t pitch, int rows, int cols){
  //compute the row
  int r = blockIdx.y*blockDim.y+threadIdx.y;
  //compute the column
  int c = blockIdx.x*blockDim.x+threadIdx.x;

  if((r < rows) && (c < cols)){
  //   // update the pointer to point to the beginning of the row
    //int *Row = (int*)((char*)d_A + r*pitch);
    int *Row = (int*)((char*)d_A);
    int elem = Row[c];
    printf("%d ", elem);
  }
}


void test(int **A, int rows, int cols){
  int *d_A;
  size_t pitch;

  cudaMallocPitch((void**)&d_A, &pitch, sizeof(int)*cols, rows);
  
  cudaMemcpy2D(d_A, pitch, A, sizeof(int)*cols, sizeof(int)*cols, rows, cudaMemcpyHostToDevice);

  //Define grid and block size
  int Yblocks = rows / BLOCK_WIDTH;
  if(rows % BLOCK_WIDTH) Yblocks++;
  int Xblocks = cols / BLOCK_WIDTH;
  if(cols % BLOCK_WIDTH) Xblocks++;
  //  cout << Yblocks << "," << Xblocks << endl;
  dim3 dimGrid(Yblocks, Xblocks, 1);
  dim3 dimBlock(BLOCK_WIDTH, BLOCK_WIDTH, 1);
  //Run kernel
  kernel<<<dimGrid, dimBlock>>>(d_A, pitch, rows, cols);

  cudaMemcpy2D(A, sizeof(int)*cols, d_A, pitch, sizeof(int)*cols, rows, cudaMemcpyDeviceToHost);

  cudaFree(&d_A);
}


int main(){
  int rows = 2;
  int cols = 2;

  int **A;
  A = new int*[rows];
  for(int i = 0; i < rows; ++i){ 
    A[i] = new int[cols];
    for(int j = 0; j < cols; ++j)
      A[i][j] = i+2;
  }

  test(A, rows, cols);

  for(int i = 0; i < rows; ++i){
    for(int j = 0; j < cols; ++j)
      cout << A[i][j] << " ";
    cout << "\n";
  }

  for(int i = 0; i < rows; ++i) delete[] A[i];
  delete[] A;

  return 0;
}

I compile simply with “nvcc name.cu -o name” and the device is GTX 550 Ti (2.1 computing capability). I wanted to take a 2D C array as imput, instead of a 1D array. Don’t know if that is the problem, but when I omit the kernel the cudaMemcpy2D seems to copy correctly A to d_A and back.

Yup, that’s right. I did now cuda-memcheck as you show in your example and obtain

========= CUDA-MEMCHECK
33661504 33661504 0 0 2 2 
3 3 
========= Program hit cudaErrorInvalidDevicePointer (error 17) due to "invalid device pointer" on CUDA API call to cudaFree. 
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x2ef613]
=========     Host Frame:./malloctest [0x3a2c6]
=========     Host Frame:./malloctest [0x2868]
=========     Host Frame:./malloctest [0x2960]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xf5) [0x21ec5]
=========     Host Frame:./malloctest [0x2609]
=========
========= ERROR SUMMARY: 1 error

I will look at the error handling. Thanks.

calvar · September 28, 2014, 11:56pm

Well, it seems the memcheck error was do to the cudaFree(&d_A), which should be cudaFree(d_A).
But the program still prints values which are not the original A matrix values.

calvar · September 29, 2014, 12:13am

I guess you cannot pass an array of arrays to any cudaMalloc, or you can but then finding where are the original values is a real problem. A plain 1d array is the only way to go for the moment.

Tnx anyway!

Robert_Crovella · September 29, 2014, 12:23am

That’s correct. cudaMemcpy expects a single pointer (*) to a contiguous allocation. Your allocation method does not guarantee that the data are all contiguous, and you are passing a pointer-to-pointer (**) to cudaMemcpy which won’t work.

MutantJohn · September 29, 2014, 12:47am

I’m not sure if this helpful or not but I think, at least for me, it’s easier to just allocate a 1D array and do the bookkeeping on it yourself. All memory in computers is 1D anyway and I’ve found dealing with CUDA in this regard to be more trouble than it’s worth but that honestly just be because I’m bad at CUDA lol.

Topic		Replies	Views
CUDA 2D Array Problem Need help to manipulate 2D arrays in CUDA CUDA Programming and Performance	4	26451	March 17, 2011
help cudaMemcpy2d Trying to modify a 2d array on cuda device CUDA Programming and Performance	8	4994	September 11, 2010
Problems with creating an array of Cuda pointers CUDA Programming and Performance	7	13637	April 20, 2009
Copying 2D array from host to device CUDA Programming and Performance	7	7263	July 27, 2010
2D Array Not Updated CUDA Programming and Performance	6	5254	May 4, 2010
Using 2D array in CUDA CUDA Programming and Performance	7	7527	July 21, 2015
Passing a multidimensional array to kernel how to allocate space in host and pass to device? CUDA Programming and Performance	12	16230	November 22, 2014
2d matrix passing values help with this code CUDA Programming and Performance	4	3214	November 10, 2010
2d array testing in very simple code using CUDA CUDA Programming and Performance	29	30417	November 15, 2010
2D host memory allocation CUDA Programming and Performance	3	2678	February 25, 2009

Help with cuda 2d array

Related topics