Hello there,
First of all, let me point out im relatively new to the world of GPU programming but still
This is a SpMV for CSR as it is in the programming guide.
global void SpMV(
const float * csrNz_d, const int * csrCols_d,
const int * csrRowStart_d, const float * x_d, float * y_d,
const int num_rows
)
{
int row = blockIdx.x * blockDim.x + threadIdx.x;
if(row < num_rows ){
float dot = 0; //or float ?!?!
int row_start = csrRowStart_d[ row];
int row_end = csrRowStart_d[ row +1];
for (int jj = row_start ; jj < row_end ; jj ++)
dot += csrNz_d[jj] * x_d[ csrCols_d[jj ]];
y_d[ row ] += dot;
}
}
This is part of the main () :
int block_size = 512;
int n_blocks = dim.M/block_size + (dim.M%block_size == 0 ? 0:1);
SpMV <<< n_blocks, block_size >>> (csrNz_d, csrCols_d, csrRowStart_d, x_d, y_d, dim.M);
suing the terms in the guide that would be:
SpMV <<< n_blocks, block_size >>> (data, indices, ptr, x, y, num_rows);
Well, this seems to be perfectly working for relatively small matrices of 100 000 elements and sizes of around 50 000 x 50 000.
However, when I input a bigger matrix, e.g. 480 000 x 171 000 with approx 6 million non-zero elements, the returned vector is all zeros. I have tried a lot of different matrices, it only works for the smaller ones. I have placed error catching statements after each device statement, however, it does not report anything. It simply returns my y_d vector of all 0 elements.
I’m using a 8600GT M GPU.
Any suggestions why this could be happening ?
Cheers,