Hi to all,

I’m experimenting a strange behavior that I can’t understand.

I have two kernels which operate on the same objects but in a symmetrical way.

The first one multiplies any line of a Matrix V for a vector P and puts the results in a complex matrix C:

So

[b]

```
[u] C(i,:).x = V(i,:)*P(:)
C(i,:).y =0. [/u] for any line i
```

[/b]

The second one works in symmetrical way:

[b]

```
V(i,;) = C(i,:).x*P(:) for any line i
```

[/b]

I aspect the same execution time for both kernels but the first kernel is twice faster than the second one.

Evidentially I’m missing something.

Some idea, some help?

thanks

Marco

here the 2 kernels:

_

[codebox]

**global** void

second(cucmplx* cvn, cureal* vn, cureal* po, int height,int nfftrec,int pthsize)

{

int x = IMUL(blockDim.x,blockIdx.x) + threadIdx.x;

int y = IMUL(blockDim.y,blockIdx.y);

if(x<nfftrec && y<height){

```
int ind = IMUL(y, pthsize)+x;
vn[ind] = cvn[ind].x*pot[x];
```

}

}

[/codebox]

[codebox]

**global** void

first(cucmplx* cvn, cureal* vn, cureal* pot, int height,int nfftrec,int pthsize)

{

int x = IMUL(blockDim.x,blockIdx.x) + threadIdx.x;

int y = IMUL(blockDim.y,blockIdx.y);

if(x<nfftrec && y<height){

```
int ind = IMUL(y, pthsize)+x;
cvn[ind].x = vn[ind]*pot[x];
cvn[ind].y = 0.0f;
}
```

}

[/codebox]