how does a 1-D array map onto a grid(2,2) block(2,2) how does the following program data map

// example0.cpp : Defines the entry point for the console application.
//

#include “stdafx.h”
#include <stdio.h>
#include <assert.h>
#include <cuda.h>
#include <cutil_inline.h>

global void incrementArrayOnDevice(float a,floatB)
{
int w = 4;
int iy = blockDim.y * blockIdx.x + threadIdx.y;
int ix = blockDim.x * blockIdx.x + threadIdx.x;
int idx = ix*w + iy;

b[idx] = a[idx];

}

int main(void)
{
float *a,*b;
float *a_d,*b_d;
int i;
int N = 4;
int size = sizeof(float)NN;

a = (float *)(malloc(size));
b = (float *)(malloc(size));

cudaMalloc((void **) &a_d, (size));
cudaMalloc((void **) &b_d, (size));

for(i=0;i<(N*N);i++)
{
a[i] = i;
b[i] = 0;
}

cudaMemcpy(a_d, a, size, cudaMemcpyHostToDevice);
cudaMemcpy(b_d, b, size, cudaMemcpyHostToDevice);

dim3 dimGrid(2,2);
dim3 dimBlock(2,2);

/Start execution and timing***/

unsigned int timer;
CUT_SAFE_CALL(cutCreateTimer(&timer));
CUT_SAFE_CALL(cutStartTimer(timer));

incrementArrayOnDevice<<<dimGrid,dimBlock>>>(a_d,b_d);

//Timer End
CUT_SAFE_CALL(cutStopTimer(timer));

cudaMemcpy(a, a_d, size, cudaMemcpyDeviceToHost);
cudaMemcpy(b, b_d, size, cudaMemcpyDeviceToHost);

for(i=0;i<(N*N);i++)
{
printf("%f\t%f",a[i],b[i]);
printf("\n");
}
printf("\nTime:%f ms\n", cutGetTimerValue(timer));
free(a);free(B);
cudaFree(a_d);cudaFree(b_d);
}

i tried interchanging iy for ix yet i ended up with the same result. i incremented ix by 1, iy by 1 but i am getting the same answer all the time. how does the values i initialise in the program { 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}
map into a 4 codeblock, each codeblock containing 4 threads

Question: are you sure that your kernel is correct?

I would suggest following one

[codebox]// <- width ->

// ^ ±--------+

// | | |

// height | |

// | | |

// v ±--------+

//

global void incrementArrayOnDevice(float a, floatb, int width )

{

int iy = blockDim.y * blockIdx.y + threadIdx.y;

int ix = blockDim.x * blockIdx.x + threadIdx.x;

int idx = iy*width + ix;

b[idx] = a[idx];

}[/codebox]

and in host code

[codebox] dim3 dimGrid(2,2);

dim3 dimBlock(2,2);

incrementArrayOnDevice<<<dimGrid,dimBlock>>>(a_d, b_d, N);[/codebox]

Hi LsChien,

i want to understand what these 2 statements mean:

int iy = blockDim.y * blockIdx.y + threadIdx.y;
int ix = blockDim.x * blockIdx.x + threadIdx.x;

I cant understand ix & iy statements.

Based on the pic i’ve attached, i got some idea, but i dont kow how to transfer the concept to iy and ix.
The pic i attached explains how X- type mapping occurs. Can you help me understand how Y-type mapping works.
idx.jpg

you can read Programming guide 2.3

(1) example of matrix addition in page 9 and

(2) 2D grids with 2D thread blocks in figure 2.1 (page 10)

however when you access 2-D matrix, you must determine orientation of matrix.

for example, matrix A has dimension 40 x 50, in programming guide

[codebox]int i = blockIdx.x * blockDim.x + threadIdx.x;

int j = blockIdx.y * blockDim.y + threadIdx.y;

if (i < 40 && j < 50){

C[i][j] = A[i][j] + B[i][j]; 

}[/codebox]

however I would like

[codebox]int j = blockIdx.x * blockDim.x + threadIdx.x;

int i = blockIdx.y * blockDim.y + threadIdx.y;

if (i < 40 && j < 50){

C[i][j] = A[i][j] + B[i][j]; 

}[/codebox]