Problems with accessing and copying 2D Arrays

Hi,

I am having trouble copying and accessing 2D arrays to/from host to devices. I read the posting on this forum and also saw the programming guide and reference manual. To test the functions, I wrote the following program by using the snippet from the programming guide (v2.1, pages 34/35). I am getting segmentation violation during memCopy2d().

Now if I modifty pitch_a (which is in bytes) by pitch_asizeof(float), the program runs but gives incorrect results. (This pitch_asizeof(float) is obviously a wrong pitch size).

float **ah, *ad, **bh, *bd, **ch, *cd;

int width = 45000, height = 400;

 ah = (float **) malloc ( height * sizeof ( float* ) );
 bh = (float **) malloc ( height * sizeof ( float* ) );
 ch = (float **) malloc ( height * sizeof ( float* ) );
 for ( i=0; i< height; i++ ){
   ah[i] = (float *) malloc ( width * sizeof ( float ) );
   bh[i] = (float *) malloc ( width * sizeof ( float ) );
   ch[i] = (float *) malloc ( width * sizeof ( float ) );
 }


for ( i=0; i<400; i++ ){
  for ( j=0; j<45000; j++ ){
    ah[i][j] = 1.0;
    bh[i][j] = 2.0;
    ch[i][j] = 0.0;

  }
}

size_t pitch_a, pitch_b, pitch_c;

cudaError_t err = cudaMallocPitch((void **)&ad, (size_t *)&pitch_a, width*sizeof(float), height);
assert(err == cudaSuccess);

err = cudaMallocPitch((void **)&bd, (size_t *)&pitch_b, width*sizeof(float), height);
assert(err == cudaSuccess);

err = cudaMallocPitch((void **)&cd, (size_t *)&pitch_c, width*sizeof(float), height);
assert(err == cudaSuccess);

cudaMemcpy2D(ad, pitch_a, ah, widthsizeof(float), widthsizeof(float), height, cudaMemcpyHostToDevice);

cudaMemcpy2D(bd, pitch_b, bh, width*sizeof(float), width*sizeof(float), height, cudaMemcpyHostToDevice);

cudaMemcpy2D(cd, pitch_c, ch, width*sizeof(float), width*sizeof(float), height, cudaMemcpyHostToDevice);

kernel:

const unsigned int iam = blockIdx.x*blockDim.x+threadIdx.x; (Column block distribution)
for(i=0; i < height; i++){

  float *row_a = (float *) ((char *)ad+ i* pitch_a);
  float *row_b = (float *) ((char *)bd+ i* pitch_b);
  float *row_c = (float *) ((char *)cd+ i* pitch_c);

  float value_a = row_a[iam];
  if (value_a != 0.0){
    atomicAdd(_count, 1);
  }

  float value_b = row_b[iam];
  if (value_b != 0.0){
    atomicAdd(_count1, 1);
  }


  row_c[iam] = row_a[iam]+row_b[iam];
  float value_c = row_c[iam];

  if (value_c != 0.0){
    atomicAdd(_count2, 1);
  }

}

Can someone tell me what I doing wrong?

Thanks in advance.

-regards,
Rajesh

I managed to address the problem by allocating the two dimensional host array as a single-dimensional linear structure.

-rajesh