I’m just starting to play with CUDA. I wrote a semi-complicated program, but it wasn’t working, so I decided to jump way back to baby steps. I’m just testing copying memory to and from the device, editing some values, etc.
Here’s my code (the relevant parts, anyway):
#define ROWS 16
#define COLS 16
#define At(a, r, c) *((a) + ((r) * ROWS + (c)) * sizeof(int))
int main(){
size_t size = ROWS * COLS * sizeof(int);
int* a = (int *)malloc(size);
//int* b = (int *)malloc(size);
int i,j;
for(i = 0 ; i < ROWS; i++){
for(j = 0; j < COLS; j++){
At(a, i,j) = 1;
}
}
print2D(a, ROWS, COLS);
printf("\n");
int *d_a;
cudaMalloc(&d_a, size);
cudaMemcpy(d_a, a, size, cudaMemcpyHostToDevice);
for(i = 0 ; i < ROWS; i++){
for(j = 0; j < COLS; j++){
At(a, i,j) = 2;
}
}
dim3 dimBlock(1,1);
dim3 dimGrid(1,1);
t<<<dimGrid, dimBlock>>>(d_a, ROWS, COLS);
cudaMemcpy(a, d_a, size, cudaMemcpyDeviceToHost);
print2D(a, ROWS, COLS);
//free(a);
//free(b);
//cudaFree(d_a);
}
__global__ void t(int* m, int rows, int cols){
int i, j;
for(i = 0; i < rows; i++){
for(j = 0; j < rows; j++){
At(m,i,j) = 4;
}
}
}
__host__ void print2D(int* mat, int rows, int cols){
int i;
int j;
for(i = 0; i < rows; i++){
for(j = 0; j < cols; j++){
printf("%5d", At(mat,i,j));
}
printf("\n");
}
}
What I intended to happen-
-
Allocate the array, initialize every entry to 1
-
Move the array to the device
-
Change every entry in the array on the host to 2
-
Change every entry in the array on the device to 4
-
copy from device → host
-
The second print statement should be all 4s
But what actually happens is when I print a the second time, I get the first 3 rows to be 4s and the rest oft he entries are 2s. If I change size to a bigger number, all of the entries are 2.
Also, I noticed that after my malloc of a, I can’t do anything with regards to memory management. Allocating a new array, freeing a or cudaFreeing d_a all cause a segfault. This might be a bit of C that has fallen out of my brain, but any ideas why that’s happening?
I’ve looked at sample code and I don’t see what I’m doing differently. I know it has to be something simple, so what am I missing?
Thanks.