Hello,

I have the following problem:

While doing inner product, each multiplication of inner product is handled by a thread.

result = ax + by

ax and by are calculated by different threads, and they are added up within indiviual threads after calculation.

In my sparse matrix kernel only

result = by, only the final elements are stored.

What maybe the reason of this, and what can I do?

Mustafa

The Kernel code:

matrixMulCOO( float * values, int * rowIndices, int * colIndices, int numberOfElements,

float* B,

float* C)

{

// Block index

//int idx = /*blockIdx.x* blockDim.x +*/ threadIdx.x;

int idx = blockIdx.x * blockDim.x + threadIdx.x;

```
if(idx < numberOfElements )
for(int j = 0; j < WB; j++)
{
//A.values[idx];
C[ rowIndices[idx]*WC + j ]
= values[idx] * B[WB*colIndices[idx] + j];
__syncthreads();
}
//if(idx < HC )
//C[ idx] = values[idx];
```

}

Results:

Device 0: “GeForce GTX 460” with Compute 2.1 capability

array2d <3, 4>

0 1 2 0

0 0 0 0

0 0 10 11

Row: 0 Col: 1 Value: 1.000000

Row: 0 Col: 2 Value: 2.000000

Row: 2 Col: 2 Value: 10.000000

Row: 2 Col: 3 Value: 11.000000

host B Mat 4 x 3

0.00 1.00 2.00

3.00 4.00 5.00

6.00 7.00 8.00

9.00 10.00 11.00

Device C Mat 3 x 3

12.00 14.00 16.00

0.00 0.00 0.00

99.00 110.00 121.00