Hi to all,
I’m experimenting a strange behavior that I can’t understand.
I have two kernels which operate on the same objects but in a symmetrical way.
The first one multiplies any line of a Matrix V for a vector P and puts the results in a complex matrix C:
So
[b]
[u] C(i,:).x = V(i,:)*P(:)
C(i,:).y =0. [/u] for any line i
[/b]
The second one works in symmetrical way:
[b]
V(i,;) = C(i,:).x*P(:) for any line i
[/b]
I aspect the same execution time for both kernels but the first kernel is twice faster than the second one.
Evidentially I’m missing something.
Some idea, some help?
thanks
Marco
here the 2 kernels:
_
[codebox]
global void
second(cucmplx* cvn, cureal* vn, cureal* po, int height,int nfftrec,int pthsize)
{
int x = IMUL(blockDim.x,blockIdx.x) + threadIdx.x;
int y = IMUL(blockDim.y,blockIdx.y);
if(x<nfftrec && y<height){
int ind = IMUL(y, pthsize)+x;
vn[ind] = cvn[ind].x*pot[x];
}
}
[/codebox]
[codebox]
global void
first(cucmplx* cvn, cureal* vn, cureal* pot, int height,int nfftrec,int pthsize)
{
int x = IMUL(blockDim.x,blockIdx.x) + threadIdx.x;
int y = IMUL(blockDim.y,blockIdx.y);
if(x<nfftrec && y<height){
int ind = IMUL(y, pthsize)+x;
cvn[ind].x = vn[ind]*pot[x];
cvn[ind].y = 0.0f;
}
}
[/codebox]