Hello All,

I have been working with CUDA Tesla card for the last 3 months. The journey has been really challenging and fun at times.

I have tried to replicate the Efficient Sparse Matrix-Vector Multiplication Algorithm as given by Nathan Bell and Michael Garland in the white paper dated Dec 11, 2008.

http://www.nvidia.com/object/nvidia_research_pub_001.html

cuda tesla

I have used the CSR format for my matrix, also I have slightly modified the algorithm to handle complex numbers

struct complex {

double x;

double y;

};

// grid and thread size

int threadPerBlock = 512;

int numBlock=(VCTSIZE/threadPerBlock)+1; // VCTSIZE is the size of the vector to be multiplied = square matrix size

csr_spmv_kernel<<<numBlock, threadPerBlock>>>(row_size_ptr_kr, csrRowPtr, xInd, xVal, y, Finalans);

**global** void csr_spmv_kernel( const int *row_size_kr, // matrix row size

const int *csrRowPtr_kr, // row pointer

const int *xInd_kr, // Col index

const struct complex *xVal_kr, // values of the matrix

const struct complex *y_kr, // the vector

struct complex *Finalans_kr // final result

){

```
int row, row_start, row_end, jj, row_size;
double dot_real, dot_img;
row_size = *row_size_kr;
row = blockDim.x * blockIdx.x + threadIdx.x; //blockDim.x*
if(row < row_size){
dot_real=0;
dot_img=0;
row_start = csrRowPtr_kr[row];
row_end = csrRowPtr_kr[row+1];
for(jj = row_start; jj < row_end; jj++ ){
dot_real += (xVal_kr[jj].x * y_kr[xInd_kr[jj]].x) - (xVal_kr[jj].y * y_kr[xInd_kr[jj]].y);
dot_img += (xVal_kr[jj].x * y_kr[xInd_kr[jj]].y) + (xVal_kr[jj].y * y_kr[xInd_kr[jj]].x);
}
Finalans_kr[row].x += dot_real;
Finalans_kr[row].y += dot_img;
}
}
```

The Algorithm work fine upto Matrix and Vector size 300000x300000 and 300000. But if I change the Matrix and Vector size to say 350000 the kernel fails.

cudaDeviceSynchronize(); just after the kernel invocation reports an error from the kernel operation.

Can anyone give an insight why the algorithm works for small matrix size and fails as the matrix size is increased ?