# How to acess matrix elements that are part of a block of threads?

Hello, again!

I solved that simple matrix multiplication problem I had earlier, but now, when I need to use square matrices that are larger then 16 x 16 elements, I have to use blocks of threads. That is, I have to distribute matrix across more blocks and access a particular element in some block using appropriate blockIdx.x and blockIdx.y as well as thradIdx.x and threadIdx.y. The problem arises when I have to do actual accessing of element using . So, here is the code and the part in is missing, which is the part I wanted to ask you how to solve it.

Thanks!

``````// includes

#include <cutil_inline.h>

#include <shrUtils.h>

#include <cuda.h>

#define Width 16

#define Tile_Width 16

__global__ void MatrixMul(int *Md, int *Nd, int *Pd){

int pom=0;

for(int k=0;k<Width;k++){

pom+=Md[	]*Nd[   ];  // Here is the blank part that is missing !!!!!!!!

Pd[  ]=pom;

}

}

int main(int argc, char** argv) {

int *Md,*Nd,*Pd;	// Pointers to GPU memory

int *M,*N,*P;		// Pointers to CPU memory

int i,j;

int Amount=(2*Width)*(2*Width)*sizeof(int);

char a;

// Allocation of memory for CPU based matrix M,N and P

M = (int*) malloc((Amount));

N = (int*) malloc((Amount));

P = (int*) malloc((Amount));

// Initialization of matrices M and N

for(i=0;i<Width*2;i++){

for(j=0;j<Width*2;j++){

if(j==i){

M[i*(Width*2)+j]=1;

N[i*(Width*2)+j]=1;

}

if(i!=j){

M[i*(Width*2)+j]=0;

N[i*(Width*2)+j]=0;

}

}

}

// Allocation of GPU based memory for Md, Nd and Pd matrices

cudaMalloc((void**)&Md,Amount);

cudaMemcpy(Md,M,Amount,cudaMemcpyHostToDevice);

cudaMalloc((void**)&Nd,Amount);

cudaMemcpy(Nd,N,Amount,cudaMemcpyHostToDevice);

cudaMalloc((void**)&Pd,Amount);

// Setting up parameteres for kernel invocation

dim3 dimGrid(2,2,1);

dim3 dimBlock(Width,Width,1);

// Kernel invocation

MatrixMul<<<dimGrid,dimBlock>>>(Md,Nd,Pd);

// Return of results from GPu to CPU

cudaMemcpy(P,Pd,(Width*Width*sizeof(int)),cudaMemcpyDeviceToHost);

// Printing of results

for(i=0;i<(Width*2);i++){

printf("\n");

printf("%d.  ",i);

for(j=0;j<(Width*2);j++){

printf("[%d]",P[i*(Width*2)+j]);

}

}

printf("\n");

printf("When you cried I always wiped out all of your teats, ...\n");

scanf("%d",&a);

}
``````

As you can see, the idea was to have 4 blocks of 16 x 16 elements, so that I can have one 32 x 32 matrix.

Ok, guys, I see nobody knows what do I want.

This is the code :

``````__global__ void MatrixMul(int *Md, int *Nd, int *Pd){

int pom=0;

for(int k=0;k<Width;k++){

pom+=Md[k][y]*Nd[x][k];

Pd[x][y]=pom;

}

}
``````

And it gives me the following error:

error: expression must have pointer-to-object type

A bit of patience helps. If you post here expecting an answer to your problems in less than 90 minutes, be prepared to be disappointed.

Right, because mutli-dimensional arrays in C are really arrays of pointers, but you are passing a pointer. If you are working with linear memory, then the answer you are looking for is probably on this Wikipedia page.

Hello, everyone!
I solved the problem with this material: http://xcr.cenit.latech.edu/hapc/docs/lecture3UIUC.pdf so take a look, answer is the following.
pom += Md[RowWidth+k] * Nd[kWidth+Col];
Pd[Row*Width+Col] = pom;