# index mapping from .c file to .cu file

hi Folks:

I am trying to change a .c file to a .cu file. These files are then called by MATLAB. I am following an example. But the index change seems pretty tricky. Here I will post some code (for the sake of simplicity, I deleted some code, and I hope everything is right).

The original .c code, which is a mex file:

``````typedef struct

{				// dimentions:

double *x; 	// Ns X 1

double *af;	// Ns X 1

double *g;	// Nt X 1

double *d;	// Nt X 1

int Ns;

int Nt;

} simparams;

void *rot_sfc(void *arg)

{

simparams *p = (simparams *)arg;

int i,j;

double aft,ar;

double rot;

double xt;

double *g;

double *d;

double alfa,beta;

/* get pointers */

g = p->g;

d = p->d;

for(j = 0;j < p->Ns;j++)

{	/* initialize */

ar = 1;

aft = p->af[j];

xt = p->x[j];

for(i = p->Nt-1;i >= 0;i--)

{	/* apply rotation */

rot = xt*g[i];

alfa = cos(rot/2.0);

aft = aft*alfa;

ar = ar*z;

d[i] += aft*ar;

beta = sin(rot/2);

ar = ar*beta;

}

}

}
``````

Let me explain a little bit. The d is the returned variable, which is a Nt X 1 vector with all entries 0 created by MATLAB/c mex funtion mexFunction(). This is basically a double loop with outside loop looping through Ns, and inside looping through Nt. For each j, some value is calculated then added to d[i].

Here is the changed version with same return, as a .cu file:

``````__global__

void rot_sfc

(				// dimentions:

double *x; 	// Ns X 1

double *af;	// Ns X 1

double *g;	// Nt X 1

double *d;	// Ns X Nt

int Ns;

int Nt;

)

{

int j = blockIdx.x*blockDim.x + threadIdx.x;

int i;

int idx;

double aft,ar;

double rot;

double xt;

double alfa,beta;

while(j < Ns)

{	/* initialize */

ar = 1;

aft = af[j];

xt = x[j];

for(i = Nt-1;i >= 0;i--)

{

idx = i*Ns + j;

d[idx] = 0;

rot = xt*g[i];

alfa = cos(rot/2.0);

aft = aft*alfa;

ar = ar*z;

d[idx] = aft*ar;

beta = sin(rot/2);

ar = ar*beta;

} /* Nt loop */

j += blockDim.x * gridDim.x;

} /* while j < Ns */

}
``````

Similarly, the d is the returned variable, but instead of being a Nt X 1 vector, it is a Ns X Nt matrix now. It is initiated by MATLAB function

``````gpuArray(zeros(Ns,Nt))
``````

. So basically it is a Ns X Nt 0 matrix at the beginning. Very importantly: it has to be post processed to get the same result as the previous case. The post processing is summing it up along Ns direction, such that we have a new vector of size Nt X 1, or 1 X Nt:

I am really confused about the mapping in index from .c to .cu file. The double loop in c file,

``````for(j = 0;j < p->Ns;j++)

{

for(i = p->Nt-1;i >= 0;i--)

{

}

}
``````

seems clear and straightforward.

But the index in the .cu file:

``````int j = blockIdx.x*blockDim.x + threadIdx.x;

while(j < Ns)

{

for(i = Nt-1;i >= 0;i--)

{

idx = i*Nt_j;

d[idx] = ...

}

j += blockDim.x * gridDim.x;

}
``````

But the problem is MATLAB would not allow me to create a Ns X Nt 0 matrix, because it exceeds the max limit. So I cannot do this:

``````gpuArray(zeros(Ns, Nt))
``````

So I did this to get d:

``````__global__

void rot_sfc

(				// dimentions:

double *x; 	// Ns X 1

double *af;	// Ns X 1

double *g;	// Nt X 1

double *d;	// Ns X Nt

int Ns;

int Nt;

)

{

int j = blockIdx.x*blockDim.x + threadIdx.x;

int i;

double aft,ar;

double rot;

double xt;

double alfa,beta;

while(j < Ns)

{	/* initialize */

ar = 1;

aft = af[j];

xt = x[j];

for(i = Nt-1;i >= 0;i--)

{

rot = xt*g[i];

alfa = cos(rot/2.0);

aft = aft*alfa;

ar = ar*z;

d[i] += aft*ar;

beta = sin(rot/2);

ar = ar*beta;

} /* Nt loop */

j += blockDim.x * gridDim.x;

} /* while j < Ns */

}
``````

But the result is NOT right. I don’t know why. Perhaps the index messed up the whole thing?

seems pretty confusing. I mean, if j is not consecutive over natural integer number, doesn’t skip some locations in a Ns X Nt matrix?

I understand “blockIdx.x*blockDim.x + threadIdx.x” has something to do with how threads is formed in block in grid in GPU, but I just cannot figure out how these indices are mapped.

Anyone thoughts is highly appreciated.