Hello everyone…!!

I am trying to implement a iterative linear solver named “Conjugate Gradient Solver” in CUDA which solves equation of form,

A*x=b,

where A is sparse symmetric positive definite matrix,

x is unknown vector with initial guess as 0 and

b is a vector on right hans side of the equation.

There are many operations included in my code like Sparse Matrix-vector multiplication,vector-vector operations.

```
My code works fine with matrix size upto 31 X 31,but not more than 31 X 31. It may be because of the number of threads allocated to a kernel function. I am allocating threads as
mul<<<1,nrows>>>()
```

Here mul is a function used to perform Sparse matrix-vector multiplication and nrows is the number of rows in a sparse matrix,A.

Is this problem related to 1 wrap size=32 threads ?

If anyone knows,please suggest me.

Thank you…!!