# he;p regarding matrix multiplication using pitch

i am trying to do matrix multiplication on a 2D array using pitch.
i am able to load the 2D array on gpu using cudaMallocPitch() and cudaMemcpy2D() function, but i am not able to write the multiplication code.
The output which i am getting is wrong.
Can anyone help me out in the code

here’s the which i have written

//—code for matrix multiplication using pitch—

float Pvalue=0;
xid = blockIdx.x * blockDim.x + threadIdx.x;
yid = blockIdx.y * blockDim.y + threadIdx.y;

for (int k = 0; k < N; ++k) { //D=TM
float Melement = T[yid
pitch+k];
float Nelement = M[k+xidpitch];
Pvalue += Melement * Nelement;
}
D[yid
pitch+xid] = Pvalue;

//---------

I guess that you want to compute

``````D( yid, xid) = T(yid, k) * M(k, xid) // sum over k
``````

if you use “T[yid*pitch+k]” to represent T(yid, k), then

why you use “M[k+xidpitch]" to represent M(k, xid), not "M[kpitch + xid]”

i coded it as you said and still i am getting the wrong answer.

Actually i want to multiply 3 matrices as D=TMT’ where T’ is the transpose matrix of T.

the code after i cahb=nged was like this:-

int xid = blockIdx.x * blockDim.x + threadIdx.x;

int yid = blockIdx.y * blockDim.y + threadIdx.y;

if(xid<N && yid<N)

``````T_[xid*pitch+yid] = T[yid*pitch+xid];
``````

float Pvalue=0;

xid = blockIdx.x * blockDim.x + threadIdx.x;

yid = blockIdx.y * blockDim.y + threadIdx.y;

for (int k = 0; k < N; ++k) { //D=T*M

float Melement = T[yid*pitch+k];

float Nelement = M[k*pitch+xid];

Pvalue += Melement * Nelement;

}

D[yid*pitch+xid] = Pvalue;

xid = blockIdx.x * blockDim.x + threadIdx.x;

yid = blockIdx.y * blockDim.y + threadIdx.y;

for (int k = 0; k < N; ++k) { //D=T*M

float Melement = T_[yid*pitch+k];

float Nelement = D[k*pitch+xid];

Pvalue += Melement * Nelement;

}

M[yid*pitch+xid] = Pvalue;