Hi,
I am trying to write a program which transfers a 2D matrix of size 20X100 ( all elements initialized to zero) to the device. Once in the device I am creating two blocks to handle the first 50 cols and the next 50 cols separately. I am creating 50 threads per block for handling one column each. Now I run a for loop within the kernel from 0th row to the 20th row for each thread to modify each element of the matrix by assigning a new number say 10.5. The following is the code I had written for the same using visual studio 2005. I am able to compile the program and run it successfully. However while I try to copy the modified matrix back to the host and print it, it prints the original values (all zeros). I am stuck with this problem for quite sometime. It would be of great help if some one is able to give me a solution. I am also attaching my .cu file along with this post for convenience.
The code is as follows,
#include <stdio.h>
#include <conio.h>
#include <math.h>
#include <stdlib.h>
#include <windows.h>
#include <cutil.h>
global void Matrix2D(float **dA, int maxrows, int maxcols)
{
int idx= blockDim.x*blockIdx.x+threadIdx.x;
for(int i=0;i<maxrows;i++)
{
dA[i][idx]=10.5;
printf(“dA[%d][%d] = %f \n”,i,idx,dA[i][idx]);
}
__syncthreads();
}
int main(void)
{
// THIS IS A PROGRAM TO CREATE A 2D MATRIX IN THE DEVICE OF SIZE 20x100 AND INITIALIZE IT TO ALL ZEROS.
// CREATE A GRID OF 2 BLOCKS.
// ASSIGN FIRST 50 COLS OF THE MATRIX TO THE FIRST BLOCK
// ASSIGN THE NEXT 50 COLS OF THE MATRIX TO THE SECOND BLOCK
// CREATE 50 THREADS IN EACH BLOCK.
// USE EACH THREAD TO ASSIGN THE NUMBER 10.5 IN EACH ELEMENT OF THE MATRIX.
float A[20][100], *dA[20];
int maxrows=20;
int maxcols=100;
int size = maxcols*sizeof(float);
for (int i=0;i<maxrows;i++)
{
cudaMalloc((void**)&dA[i], size); // for every i from 0 to 19, one 1D array is allocated in the device
for(int j=0;j<maxcols;j++)
{
A[i][j]=0;
printf("%f ",A[i][j]);
}
printf("\n");
cudaMemcpy(dA[i],A[i],size,cudaMemcpyHostToDevice);
}
dim3 dimGrid(2,1);
dim3 dimBlock(50,1);
Matrix2D<<<dimGrid,dimBlock>>>(dA,maxrows,maxcols);
for (int i=0;i<maxrows;i++)
{
cudaMemcpy(A[i],dA[i],size,cudaMemcpyDeviceToHost); // copying back the modified matrix from the device to host
}
for (int i=0;i<maxrows;i++)
{
for(int j=0;j<maxcols;j++)
{
A[i][j]=0;
printf(“%f “,A[i][j]); // printing the modified array
}
printf(”\n”);
}
_getch();
return 0;
}
ARRAY2D_CUDA.cu (1.61 KB)